Explanipedia

SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization Open

Théophane Vallaeys, Jakob Verbeek, Matthieu Cord · 2025

Tokenizers are a key component of state-of-the-art generative image models, extracting the most important features from the signal while reducing data dimension and redundancy. Most current tokenizers are based on KL-regularized variationa…

IPA: An Information-Reconstructive Input Projection Framework for Efficient Foundation Model Adaptation Open

Tuan-Hung Vu, Andrei Bursuc, Matthieu Cord · 2025

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce adaptation cost by injecting low-rank updates into pretrained weights. However, LoRA's down-projection is randomly initialized and data-agnostic, discarding potentially u…

Learning to Steer: Input-dependent Steering for Multimodal LLMs Open

Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Arnaud Dapogny, Alasdair Newson , et al. · 2025

Steering has emerged as a practical approach to enable post-hoc guidance of LLMs towards enforcing a specific behavior. However, it remains largely underexplored for multimodal LLMs (MLLMs); furthermore, existing steering techniques, such …

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models Open

Barbara Toniella Corradini, Mustafa Shukor, Paul Couairon, Guillaume Couairon, Franco Scarselli , et al. · 2025

Foundation models have exhibited unprecedented capabilities in tackling many domains and tasks. Models such as CLIP are currently widely used to bridge cross-modal representations, and text-to-image diffusion models are arguably the leadin…

JAFAR: Jack up Any Feature at Any Resolution Open

Paul Couairon, Loïck Chambon, Louis Serrano, Jean‐Emmanuel Haugeard, Matthieu Cord , et al. · 2025

Foundation Vision Encoders have become essential for a wide range of dense vision tasks. However, their low-resolution spatial feature outputs necessitate feature upsampling to produce the high-resolution modalities required for downstream…

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Open

Mustafa Shukor, Dana Aubakirova, Francesco Capuano, P Kooijmans, Susana Palma , et al. · 2025

Vision-language models (VLMs) pretrained on large-scale multimodal datasets encode rich visual and linguistic knowledge, making them a strong foundation for robotics. Rather than training robotic policies from scratch, recent approaches ad…

Scaling Laws for Native Multimodal Models Open

Mustafa Shukor, Enrico Fini, Victor G. Turrisi da Costa, Matthieu Cord, Joshua M. Susskind , et al. · 2025

Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders …

GaussRender: Learning 3D Occupancy with Gaussian Rendering Open

Loïck Chambon, Éloi Zablocki, Alexandre Boulch, Mickaël Chen, Matthieu Cord · 2025

Understanding the 3D geometry and semantics of driving scenes is critical for safe autonomous driving. Recent advances in 3D occupancy prediction have improved scene representation but often suffer from spatial inconsistencies, leading to …

Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting Open

Kaouther Messaoud, Matthieu Cord, Alexandre Alahi · 2025

Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions. It is often due to limitations like complex architectures customized for a specific dataset and inef…

Analyzing Finetuning Representation Shift for Multimodal LLMs Steering Open

Pegah Khayatan, Mustafa Shukor, Jayneel Parekh, Matthieu Cord · 2025

Multimodal LLMs (MLLMs) have reached remarkable levels of proficiency in understanding multimodal inputs. However, understanding and interpreting the behavior of such complex models is a challenging task, not to mention the dynamic shifts …

PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting Open

Yihong Xu, Yuan Yin, Tuan-Hung Vu, Alexandre Boulch, Éloi Zablocki , et al. · 2024

Accurately predicting how agents move in dynamic scenes is essential for safe autonomous driving. State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories. How…

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction Open

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord , et al. · 2024

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers Open

Éloi Zablocki, Valentin Gerard, Amaia Cardiel, Éric Gaussier, Matthieu Cord , et al. · 2024

Understanding deep models is crucial for deploying them in safety-critical applications. We introduce GIFT, a framework for deriving post-hoc, global, interpretable, and faithful textual explanations for vision classifiers. GIFT starts fro…

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models Open

M. Taha Koroglu, Hugo Caselles-Dupré, Guillaume Jeanneret Sanmiguel, Matthieu Cord · 2024

We consider the problem of text-to-video generation tasks with precise control for various applications such as camera movement control and video-to-video editing. Most methods tacking this problem rely on providing user-defined controls, …

Skipping Computations in Multimodal LLMs Open

Mustafa Shukor, Matthieu Cord · 2024

Large Language Models (LLMs) have demonstrated remarkable success in both textual and multimodal domains. However, this success often comes with substantial computational costs, particularly when handling lengthy sequences of multimodal in…

LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension Open

Amaia Cardiel, Éloi Zablocki, Elias Ramzi, Oriane Siméoni, Matthieu Cord · 2024

Vision Language Models (VLMs) have demonstrated remarkable capabilities in various open-vocabulary tasks, yet their zero-shot performance lags behind task-specific fine-tuned models, particularly in complex tasks like Referring Expression …

Annealed Winner-Takes-All for Motion Forecasting Open

Yihong Xu, Victor Letzelter, Mickaël Chen, Éloi Zablocki, Matthieu Cord · 2024

In autonomous driving, motion prediction aims at forecasting the future trajectories of nearby agents, helping the ego vehicle to anticipate behaviors and drive safely. A key challenge is generating a diverse set of future predictions, com…

ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable Open

Yuan Yin, Pegah Khayatan, Éloi Zablocki, Alexandre Boulch, Matthieu Cord · 2024

Machine learning based autonomous driving systems often face challenges with safety-critical scenarios that are rare in real-world data, hindering their large-scale deployment. While increasing real-world training data coverage could addre…

Valeo4Cast: A Modular Approach to End-to-End Forecasting Open

Yihong Xu, Éloi Zablocki, Alexandre Boulch, Gilles Puy, Mickaël Chen , et al. · 2024

Motion forecasting is crucial in autonomous driving systems to anticipate the future trajectories of surrounding agents such as pedestrians, vehicles, and traffic signals. In end-to-end forecasting, the model must jointly detect and track …

A Concept-Based Explainability Framework for Large Multimodal Models Open

Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Alasdair Newson, Matthieu Cord · 2024

Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs…

DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut Open

Paul Couairon, Mustafa Shukor, Jean‐Emmanuel Haugeard, Matthieu Cord, Nicolas Thome · 2024

Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In…

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs Open

Mustafa Shukor, Matthieu Cord · 2024

Large Language Models (LLMs) have demonstrated impressive performance on multimodal tasks, without any multimodal finetuning. They are the building block for Large Multimodal Models, yet, we still lack a proper understanding of their succe…

Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive? Open

Yihong Xu, Loïck Chambon, Éloi Zablocki, Mickaël Chen, Alexandre Alahi , et al. · 2024

International audience

What matters when building vision-language models? Open

Hugo Laurençon, Léo Tronchon, Matthieu Cord, Victor Sanh · 2024

The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the d…

What Makes Multimodal In-Context Learning Work? Open

Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski · 2024

Large Language Models have demonstrated remarkable performance across various tasks, exhibiting the capacity to swiftly acquire new skills, such as through In-Context Learning (ICL) with minimal demonstration examples. In this work, we pre…

Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI Open

Hugo Caselles-Dupré, Charles Mellerio, Paul Hérent, Alizée Lopez‐Persem, Benoît Béranger , et al. · 2024

The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for i…

UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction Open

Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord , et al. · 2024

Vehicle trajectory prediction has increasingly relied on data-driven solutions, but their ability to scale to different data domains and the impact of larger dataset sizes on their generalization remain under-explored. While these question…

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs Open

Théophane Vallaeys, Mustafa Shukor, Matthieu Cord, Jakob Verbeek · 2024

The abilities of large language models (LLMs) have recently progressed to unprecedented levels, paving the way to novel applications in a wide variety of areas. In computer vision, LLMs can be used to prime vision-language tasks such image…

GradPaint: Gradient-guided inpainting with diffusion models Open

Asya Grechka, Guillaume Couairon, Matthieu Cord · 2024

Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by …

Manipulating Trajectory Prediction with Backdoors Open

Kaouther Massoud, Kathrin Grosse, Mickaël Chen, Matthieu Cord, Patrick Pérez , et al. · 2023

Autonomous vehicles ought to predict the surrounding agents' trajectories to allow safe maneuvers in uncertain and complex traffic situations. As companies increasingly apply trajectory prediction in the real world, security becomes a rele…

Matthieu Cord YOU? Author Swipe