Explanipedia

What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes Open

Candace Ross, Florian Bordes, Adina Williams, Polina Kirichenko, Mark Ibrahim · 2025

Multimodal language models possess a remarkable ability to handle an open-vocabulary's worth of objects. Yet the best models still suffer from hallucinations when reasoning about scenes in the real world, revealing a gap between their seem…

Object-centric Binding in Contrastive Language-Image Pretraining Open

Rim Assouel, Pietro Astolfi, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano · 2025

Recent advances in vision language models (VLM) have been driven by contrastive models such as CLIP, which learn to associate visual information with their corresponding text descriptions. However, these models have limitations in understa…

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Open

Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen , et al. · 2024

Multimodal Large Language Models (MLLMs) have shown promising progress in understanding and analyzing video content. However, processing long videos remains a significant challenge constrained by LLM's context size. To address this limitat…

An Introduction to Vision-Language Modeling Open

Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes , et al. · 2024

Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models t…

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Open

Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma , et al. · 2023

Curation methods for massive vision-language datasets trade off between dataset size and quality. However, even the highest quality of available curated captions are far too short to capture the rich visual detail in an image. To show the …

Feedback-guided Data Synthesis for Imbalanced Classification Open

Reyhane Askari Hemmat, Mohammad Zakaria Pezeshki, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano · 2023

Current status quo in machine learning is to use static datasets of real images for training, which often come from long-tailed distributions. With the recent advances in generative models, researchers have started augmenting these static …

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning Open

Florian Bordes, Shashank Shekhar, Mark Ibrahim, Diane Bouchacourt, P. Vincent , et al. · 2023

Synthetic image datasets offer unmatched advantages for designing and evaluating deep neural networks: they make it possible to (i) render as many data samples as needed, (ii) precisely control each scene and yield granular ground truth la…

Stochastic positional embeddings improve masked image modeling Open

Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, P. Vincent , et al. · 2023

Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predict…

Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning Open

Casey Meehan, Florian Bordes, P. Vincent, Kamalika Chaudhuri, Chuan Guo · 2023

Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another. However, when taken to the extreme, SSL models can unintendedly memorize specif…

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations Open

Shashank Shekhar, Florian Bordes, P. Vincent, Ari S. Morcos · 2023

Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading paradigms for self-supervised learning of vision transformers, but they differ substantially in their…

A Cookbook of Self-Supervised Learning Open

Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari S. Morcos, Shashank Shekhar , et al. · 2023

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are famil…

A surprisingly simple technique to control the pretraining bias for better transfer: Expand or Narrow your representation Open

Florian Bordes, Samuel Lavoie, Randall Balestriero, Nicolas Ballas, P. Vincent · 2023

Self-Supervised Learning (SSL) models rely on a pretext task to learn representations. Because this pretext task differs from the downstream tasks used to evaluate the performance of these models, there is an inherent misalignment or pretr…

Towards Democratizing Joint-Embedding Self-Supervised Learning Open

Florian Bordes, Randall Balestriero, P. Vincent · 2023

Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid developments in recent years, due to its promise to effectively leverage large unlabeled data. The development of JE-SSL methods was driven primarily by the search for ever i…

The Hidden Uniform Cluster Prior in Self-Supervised Learning Open

Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra , et al. · 2022

A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g., SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked …

Guillotine Regularization: Why removing layers is needed to improve generalization in Self-Supervised Learning Open

Florian Bordes, Randall Balestriero, Quentin Garrido, Adrien Bardes, P. Vincent · 2022

One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method, and using this network on downstream tasks but with its last few projector layers entirely removed…

Masked Siamese Networks for Label-Efficient Learning Open

Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes , et al. · 2022

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the ori…

High Fidelity Visualization of What Your Self-Supervised Representation Knows About Open

Florian Bordes, Randall Balestriero, Pascal Vincent · 2021

Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used to evaluate how good a representation is. However, relying only on such downstream task can limit …

Learning to sample from noise with deep generative models Open

Florian Bordes · 2017

L’apprentissage automatique et spécialement l’apprentissage profond se sont imposés ces dernières années pour résoudre une large variété de tâches. Une des applications les plus remarquables concerne la vision par ordinateur. Les systèmes …

Learning to Generate Samples from Noise through Infusion Training Open

Florian Bordes, Sina Honari, Pascal Vincent · 2017

In this work, we investigate a novel training procedure to learn a generative model as the transition operator of a Markov chain, such that, when applied repeatedly on an unstructured random noise sample, it will denoise it into a sample t…

Florian Bordes YOU? Author Swipe