Alexandre Ramé
YOU?
Author Swipe
View article: MedGemma Technical Report
MedGemma Technical Report Open
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that p…
View article: On Teacher Hacking in Language Model Distillation
On Teacher Hacking in Language Model Distillation Open
Post-training of language models (LMs) increasingly relies on the following two stages: (i) knowledge distillation, where the LM is trained to imitate a larger teacher LM, and (ii) reinforcement learning from human feedback (RLHF), where t…
View article: Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Open
Training of large language models (LLMs) is typically distributed across a large number of accelerators to reduce training time. Since internal states and parameter gradients need to be exchanged at each and every single gradient step, all…
View article: Diversity-Rewarded CFG Distillation
Diversity-Rewarded CFG Distillation Open
Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and div…
View article: Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size Open
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modificati…
View article: Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning Open
Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible…
View article: BOND: Aligning LLMs with Best-of-N Distillation
BOND: Aligning LLMs with Best-of-N Distillation Open
Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best ge…
View article: WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies Open
Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs) by encouraging their generations to have high rewards, using a reward model trained on human preferences. To prevent the forgetting of pre-trained knowle…
View article: Direct Language Model Alignment from Online AI Feedback
Direct Language Model Alignment from Online AI Feedback Open
Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datase…
View article: WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward Models Open
Aligning large language models (LLMs) with human preferences through reinforcement learning (RLHF) can lead to reward hacking, where LLMs exploit failures in the reward model (RM) to achieve seemingly high rewards without meeting the under…
View article: Diverse and efficient ensembling of deep networks
Diverse and efficient ensembling of deep networks Open
This thesis aims at enhancing the generalization abilities of deep neural networks, a critical step towards fair and reliable artificial intelligence. Specifically, we address the drop in performance when models are evaluated on test sampl…
View article: Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning Open
Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with …
View article: UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks Open
Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising …
View article: Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Open
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfect…
View article: Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Open
Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of inter…
View article: DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion Open
International audience
View article: Towards efficient feature sharing in MIMO architectures
Towards efficient feature sharing in MIMO architectures Open
International audience
View article: Towards efficient feature sharing in MIMO architectures
Towards efficient feature sharing in MIMO architectures Open
Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wa…
View article: Diverse Weight Averaging for Out-of-Distribution Generalization
Diverse Weight Averaging for Out-of-Distribution Generalization Open
Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strate…
View article: DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion Open
Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting effici…
View article: DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion Open
Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting effici…
View article: MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Open
Recent strategies achieved ensembling "for free" by fitting concurrently diverse subnetworks inside a single base network. The main idea during training is that each subnetwork learns to classify only one of the multiple inputs simultaneou…
View article: Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization Open
Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains - wh…
View article: Fishr: Invariant Gradient Variances for Out-of-Distribution\n Generalization
Fishr: Invariant Gradient Variances for Out-of-Distribution\n Generalization Open
Learning robust models that generalize well under changes in the data\ndistribution is critical for real-world applications. To this end, there has\nbeen a growing surge of interest to learn simultaneously from multiple training\ndomains -…
View article: MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks Open
Recent strategies achieved ensembling "for free" by fitting concurrently diverse subnetworks inside a single base network. The main idea during training is that each subnetwork learns to classify only one of the multiple inputs simultaneou…
View article: DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation
DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation Open
Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. In t…
View article: DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial\n Estimation
DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial\n Estimation Open
Deep ensembles perform better than a single network thanks to the diversity\namong their members. Recent approaches regularize predictions to increase\ndiversity; however, they also drastically decrease individual members'\nperformances. I…
View article: CORE: Color Regression for Multiple Colors Fashion Garments.
CORE: Color Regression for Multiple Colors Fashion Garments. Open
Among all fashion attributes, color is challenging to detect due to its subjective perception. Existing classification approaches can not go beyond the predefined list of discrete color names. In this paper, we argue that color detection i…
View article: CoRe: Color Regression for Multicolor Fashion Garments
CoRe: Color Regression for Multicolor Fashion Garments Open
Developing deep networks that analyze fashion garments has many real-world applications. Among all fashion attributes, color is one of the most important yet challenging to detect. Existing approaches are classification-based and thus cann…
View article: OMNIA Faster R-CNN: Detection in the wild through dataset merging and\n soft distillation
OMNIA Faster R-CNN: Detection in the wild through dataset merging and\n soft distillation Open
Object detectors tend to perform poorly in new or open domains, and require\nexhaustive yet costly annotations from fully labeled datasets. We aim at\nbenefiting from several datasets with different categories but without\nadditional label…