Explanipedia

Learning Private Representations through Entropy-based Adversarial Training Open

Tassilo Klein, Moin Nabi · 2025

How can we learn a representation with high predictive power while preserving user privacy? We present an adversarial representation learning method for sanitizing sensitive content from the learned representation. Specifically, we introdu…

From Dense to Dynamic: Token-Difficulty Driven MoEfication of Pre-Trained LLMs Open

Kumari Nishu, Sachin Mehta, Samira Abnar, Mehrdad Farajtabar, Maxwell Horton , et al. · 2025

Training large language models (LLMs) for different inference constraints is computationally expensive, limiting control over efficiency-accuracy trade-offs. Moreover, once trained, these models typically process tokens uniformly, regardle…

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models Open

Tassilo Klein, Moin Nabi · 2025

Multimodal Autoregressive Pre-training of Large Vision Encoders Open

Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein , et al. · 2024

We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this …

SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF Open

Atoosa Chegini, Hamid Kazemi, Iman Mirzadeh, Dong Yin, Maxwell Horton , et al. · 2024

In Large Language Model (LLM) development, Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning models with human values and preferences. RLHF traditionally relies on the Kullback-Leibler (KL) divergence between the cu…

Computational Bottlenecks of Training Small-scale Large Language Models Open

Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, Mohammad Hossein Sekhavat, Moin Nabi , et al. · 2024

While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and…

Chain-of-Sketch: Enabling Global Visual Reasoning Open

Aryo Lotfi, Enrico Fini, Samy Bengio, Moin Nabi, Emmanuel Abbé · 2024

Modern vision models have achieved remarkable success in benchmarks where local features provide critical information about the target. There is now a growing interest in tackling tasks requiring more global reasoning, where local features…

KV Prediction for Improved Time to First Token Open

Maxwell Horton, Qingqing Cao, Chenfan Sun, Yanzi Jin, Sachin Mehta , et al. · 2024

Inference with transformer-based language models begins with a prompt processing step. In this step, the model generates the first output token and stores the KV cache needed for future generation steps. This prompt processing step can be …

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models Open

Keivan Alizadeh, Iman Mirzadeh, Hooman Shahrokhi, Dmitry Belenko, F.W. Sun , et al. · 2024

Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models, speculati…

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Open

Mohammad Samragh, Iman Mirzadeh, Keivan Alizadeh Vahid, Fartash Faghri, Minsik Cho , et al. · 2024

The pre-training phase of language models often begins with randomly initialized parameters. With the current trends in scaling models, training their large number of parameters can be extremely slow and costly. In contrast, small language…

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models Open

Tassilo Klein, Moin Nabi · 2024

The generation of toxic content by large language models (LLMs) remains a critical challenge for the safe deployment of language technology. We propose a novel framework for implicit knowledge editing and controlled text generation by fine…

A soft nearest-neighbor framework for continual semi-supervised learning Open

Zhiqi Kang, Enrico Fini, Moin Nabi, Elisa Ricci, Karteek Alahari · 2023

Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-su…

Semi-supervised learning made simple with self-supervised clustering Open

Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal , et al. · 2023

Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations. However, in many real-world scenarios, labels are partially available, motivating a recent line of work on semi-super…

Semi-supervised learning made simple with self-supervised clustering Open

Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal , et al. · 2023

Self-supervised learning models have been shown to learn rich visual\nrepresentations without requiring human annotations. However, in many\nreal-world scenarios, labels are partially available, motivating a recent line\nof work on semi-su…

miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings Open

Tassilo Klein, Moin Nabi · 2023

This paper presents miCSE, a mutual information-based contrastive learning framework that significantly advances the state-of-the-art in few-shot sentence embedding.The proposed approach imposes alignment between the attention pattern of d…

miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings Open

Tassilo Klein, Moin Nabi · 2022

This paper presents miCSE, a mutual information-based contrastive learning framework that significantly advances the state-of-the-art in few-shot sentence embedding. The proposed approach imposes alignment between the attention pattern of …

Mixture-of-experts VAEs can disregard variation in surjective multimodal data Open

Jannik Wolff, Tassilo Klein, Moin Nabi, Rahul G. Krishnan, Shinichi Nakajima · 2022

Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational a…

Uncertainty-Aware Contrastive Distillation for Incremental Semantic Segmentation Open

Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding , et al. · 2022

A fundamental and challenging problem in deep learning is catastrophic forgetting, i.e., the tendency of neural networks to fail to preserve the knowledge acquired from old tasks when learning new tasks. This problem has been widely invest…

SCD: Self-Contrastive Decorrelation of Sentence Embeddings Open

Tassilo Klein, Moin Nabi · 2022

In this paper, we propose Self-Contrastive Decorrelation (SCD), a self-supervised approach. Given an input sentence, it optimizes a joint self-contrastive and decorrelation objective. Learning a representation is facilitated by leveraging …

A Unified Objective for Novel Class Discovery Open

Enrico Fini, Enver Sangineto, Stéphane Lathuilière, Zhun Zhong, Moin Nabi , et al. · 2021

In this paper, we study the problem of Novel Class Discovery (NCD). NCD aims at inferring novel object categories in an unlabeled set by leveraging from prior knowledge of a labeled set containing different, but related classes. Existing a…

Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language Models Open

Tassilo Klein, Moin Nabi · 2021

Can we get existing language models and refine them for zero-shot commonsense reasoning? This paper presents an initial study exploring the feasibility of zero-shot commonsense reasoning for the Winograd Schema Challenge by formulating the…

Attention-based Contrastive Learning for Winograd Schemas Open

Tassilo Klein, Moin Nabi · 2021

Self-supervised learning has recently attracted considerable attention in the NLP community for its ability to learn discriminative features using a contrastive objective. This paper investigates whether contrastive learning can be extende…

Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement\n of Language Models Open

Tassilo Klein, Moin Nabi · 2021

Can we get existing language models and refine them for zero-shot commonsense\nreasoning? This paper presents an initial study exploring the feasibility of\nzero-shot commonsense reasoning for the Winograd Schema Challenge by\nformulating …

Solo-learn: A Library of Self-supervised Methods for Visual\n Representation Learning Open

Victor G. Turrisi da Costa, Enrico Fini, Moin Nabi, Nicu Sebe, Elisa Ricci · 2021

This paper presents solo-learn, a library of self-supervised methods for\nvisual representation learning. Implemented in Python, using Pytorch and\nPytorch lightning, the library fits both research and industry needs by\nfeaturing distribu…

Solo-learn: A Library of Self-supervised Methods for Visual Representation Learning Open

Victor G. Turrisi da Costa, Enrico Fini, Moin Nabi, Nicu Sebe, Elisa Ricci · 2021

This paper presents solo-learn, a library of self-supervised methods for visual representation learning. Implemented in Python, using Pytorch and Pytorch lightning, the library fits both research and industry needs by featuring distributed…

EaSe: A Diagnostic Tool for VQA based on Answer Diversity Open

Shailza Jolly, Sandro Pezzelle, Moin Nabi · 2021

We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample. EASE is based on the pattern of answers provided by multiple annotators to a given question. In par…

Attention-based Contrastive Learning for Winograd Schemas Open

Tassilo Klein, Moin Nabi · 2021

Self-supervised learning has recently attracted considerable attention in the NLP community for its ability to learn discriminative features using a contrastive objective. This paper investigates whether contrastive learning can be extende…

Multimodal Self-supervised Learning for Medical Image Analysis Open

Aiham Taleb, Christoph Lippert, Tassilo Klein, Moin Nabi · 2021

Multimodal Prototypical Networks for Few-shot Learning Open

Frederik Pahde, Mihai Puscas, Tassilo Klein, Moin Nabi · 2021

Although providing exceptional results for many computer vision tasks, state-of-the-art deep learning algorithms catastrophically struggle in low data scenarios. However, if data in additional modalities exist (e.g. text) this can compensa…

Towards Zero-shot Commonsense Reasoning with Self-supervised Refinement of Language Models Open

Tassilo Klein, Moin Nabi · 2021

Can we get existing language models and refine them for zero-shot commonsense reasoning? This paper presents an initial study exploring the feasibility of zero-shot commonsense reasoning for the Winograd Schema Challenge by formulating the…

Moin Nabi YOU? Author Swipe