Explanipedia

MedGemma Technical Report Open

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla P. Kiraly, Madeleine Traverse , et al. · 2025

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that p…

On Teacher Hacking in Language Model Distillation Open

Daniil Tiapkin, Daniele Calandriello, Johan Ferret, Sarah Perrin, Nino Vieillard , et al. · 2025

Post-training of language models (LMs) increasingly relies on the following two stages: (i) knowledge distillation, where the LM is trained to imitate a larger teacher LM, and (ii) reinforcement learning from human feedback (RLHF), where t…

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Open

Arthur Douillard, Yanislav Donchev, Keith Rush, Satyen Kale, Zachary Charles , et al. · 2025

Training of large language models (LLMs) is typically distributed across a large number of accelerators to reduce training time. Since internal states and parameter gradients need to be exchanged at each and every single gradient step, all…

Diversity-Rewarded CFG Distillation Open

Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Élie , et al. · 2024

Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and div…

Gemma 2: Improving Open Language Models at a Practical Size Open

Gemma Team, Morgane Rivière, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin , et al. · 2024

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modificati…

Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning Open

Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, A. Agarwal, Christoph Dann , et al. · 2024

Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible…

BOND: Aligning LLMs with Best-of-N Distillation Open

Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Nino Vieillard , et al. · 2024

Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best ge…

WARP: On the Benefits of Weight Averaged Rewarded Policies Open

Alexandre Ramé, Johan Ferret, Nino Vieillard, Robert Dadashi, Léonard Hussenot , et al. · 2024

Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs) by encouraging their generations to have high rewards, using a reward model trained on human preferences. To prevent the forgetting of pre-trained knowle…

Direct Language Model Alignment from Online AI Feedback Open

Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman , et al. · 2024

Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datase…

WARM: On the Benefits of Weight Averaged Reward Models Open

Alexandre Ramé, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron , et al. · 2024

Aligning large language models (LLMs) with human preferences through reinforcement learning (RLHF) can lead to reward hacking, where LLMs exploit failures in the reward model (RM) to achieve seemingly high rewards without meeting the under…

Diverse and efficient ensembling of deep networks Open

Alexandre Ramé · 2023

This thesis aims at enhancing the generalization abilities of deep neural networks, a critical step towards fair and reliable artificial intelligence. Specifically, we address the drop in performance when models are evaluated on test sampl…

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning Open

Mustafa Shukor, Alexandre Ramé, Corentin Dancette, Matthieu Cord · 2023

Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with …

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks Open

Mustafa Shukor, Corentin Dancette, Alexandre Ramé, Matthieu Cord · 2023

Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising …

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Open

Alexandre Ramé, Guillaume Couairon, Mustafa Shukor, Corentin Dancette, Jean-Baptiste Gaya , et al. · 2023

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfect…

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Open

Alexandre Ramé, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, Léon Bottou , et al. · 2022

Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of inter…

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion Open

Arthur Douillard, Alexandre Ramé, Guillaume Couairon, Matthieu Cord · 2022

International audience

Towards efficient feature sharing in MIMO architectures Open

Rémy Sun, Alexandre Ramé, Clement Masson, Nicolas Thome, Matthieu Cord · 2022

International audience

Towards efficient feature sharing in MIMO architectures Open

Rémy Sun, Alexandre Ramé, Clément Masson, Nicolas Thome, Matthieu Cord · 2022

Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wa…

Diverse Weight Averaging for Out-of-Distribution Generalization Open

Alexandre Ramé, Matthieu Kirchmeyer, Thibaud Rahier, Alain Rakotomamonjy, Benjamin Piwowarski , et al. · 2022

Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strate…

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion Open

Arthur Douillard, Alexandre Ramé, Guillaume Couairon, Kévin Bailly · 2021

Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting effici…

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion Open

Arthur Douillard, Alexandre Ramé, Guillaume Couairon, Kévin Bailly · 2021

Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting effici…

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Open

Alexandre Ramé, Rémy Sun, Matthieu Cord · 2021

Recent strategies achieved ensembling "for free" by fitting concurrently diverse subnetworks inside a single base network. The main idea during training is that each subnetwork learns to classify only one of the multiple inputs simultaneou…

Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization Open

Alexandre Ramé, Corentin Dancette, Matthieu Cord · 2021

Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains - wh…

Fishr: Invariant Gradient Variances for Out-of-Distribution\n Generalization Open

Alexandre Ramé, Corentin Dancette, Matthieu Cord · 2021

Learning robust models that generalize well under changes in the data\ndistribution is critical for real-world applications. To this end, there has\nbeen a growing surge of interest to learn simultaneously from multiple training\ndomains -…

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Open

Alexandre Ramé, Rémy Sun, Kévin Bailly · 2021

Recent strategies achieved ensembling "for free" by fitting concurrently diverse subnetworks inside a single base network. The main idea during training is that each subnetwork learns to classify only one of the multiple inputs simultaneou…

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation Open

Alexandre Ramé, Kévin Bailly · 2021

Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. In t…

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial\n Estimation Open

Alexandre Ramé, Matthieu Cord · 2021

Deep ensembles perform better than a single network thanks to the diversity\namong their members. Recent approaches regularize predictions to increase\ndiversity; however, they also drastically decrease individual members'\nperformances. I…

CORE: Color Regression for Multiple Colors Fashion Garments. Open

Alexandre Ramé, Arthur Douillard, Charles Ollion · 2020

Among all fashion attributes, color is challenging to detect due to its subjective perception. Existing classification approaches can not go beyond the predefined list of discrete color names. In this paper, we argue that color detection i…

CoRe: Color Regression for Multicolor Fashion Garments Open

Alexandre Ramé, Arthur Douillard, Charles Ollion · 2020

Developing deep networks that analyze fashion garments has many real-world applications. Among all fashion attributes, color is one of the most important yet challenging to detect. Existing approaches are classification-based and thus cann…

OMNIA Faster R-CNN: Detection in the wild through dataset merging and\n soft distillation Open

Alexandre Ramé, Emilien Garreau, Hedi Ben-younes, Charles Ollion · 2018

Object detectors tend to perform poorly in new or open domains, and require\nexhaustive yet costly annotations from fully labeled datasets. We aim at\nbenefiting from several datasets with different categories but without\nadditional label…

Alexandre Ramé YOU? Author Swipe