Pramuditha Perera
YOU?
Author Swipe
View article: Descriminative-Generative Custom Tokens for Vision-Language Models
Descriminative-Generative Custom Tokens for Vision-Language Models Open
This paper explores the possibility of learning custom tokens for representing new concepts in Vision-Language Models (VLMs). Our aim is to learn tokens that can be effective for both discriminative and generative tasks while composing wel…
View article: Compositional Structures in Neural Embedding and Interaction Decompositions
Compositional Structures in Neural Embedding and Interaction Decompositions Open
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks and conditional independence constraints on the probability distributions modeled by these networks. Our framewor…
View article: Multi-Modal Hallucination Control by Visual Information Grounding
Multi-Modal Hallucination Control by Visual Information Grounding Open
Generative Vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as "hallucination" and show th…
View article: Meaning Representations from Trajectories in Autoregressive Models
Meaning Representations from Trajectories in Autoregressive Models Open
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is appl…
View article: Prompt Algebra for Task Composition
Prompt Algebra for Task Composition Open
We investigate whether prompts learned independently for different tasks can be later combined through prompt algebra to obtain a model that supports composition of tasks. We consider Visual Language Models (VLM) with prompt tuning as our …
View article: À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting
À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting Open
We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different …
View article: Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge Open
The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task…
View article: Benchmarking Diverse-Modal Entity Linking with Generative Models
Benchmarking Diverse-Modal Entity Linking with Generative Models Open
Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding, or s…
View article: Train/Test-Time Adaptation with Retrieval
Train/Test-Time Adaptation with Retrieval Open
We introduce Train/Test-Time Adaptation with Retrieval (${\rm T^3AR}$), a method to adapt models both at train and test time by means of a retrieval module and a searchable pool of external samples. Before inference, ${\rm T^3AR}$ adapts a…
View article: Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models Open
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings of words from a pre-existing vocabulary.…
View article: À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting
À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting Open
We introduce À-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. The individual prompts can be trained in isolation, possibly on different …
View article: Benchmarking Diverse-Modal Entity Linking with Generative Models
Benchmarking Diverse-Modal Entity Linking with Generative Models Open
Sijia Wang, Alexander Hanbo Li, Henghui Zhu, Sheng Zhang, Pramuditha Perera, Chung-Wei Hang, Jie Ma, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Bing Xiang, Patrick Ng. Findings of the Association for Computational Linguistics: ACL …
View article: Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge Open
Xingyu Fu, Sheng Zhang, Gukyeong Kwon, Pramuditha Perera, Henghui Zhu, Yuhao Zhang, Alexander Hanbo Li, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, Dan Roth, Bing Xiang. Findings of the Association for Computational Ling…
View article: Open-set Adversarial Defense with Clean-Adversarial Mutual Learning
Open-set Adversarial Defense with Clean-Adversarial Mutual Learning Open
Open-set recognition and adversarial defense study two key aspects of deep learning that are vital for real-world deployment. The objective of open-set recognition is to identify samples from open-set classes during testing, while adversar…
View article: Federated Generalized Face Presentation Attack Detection
Federated Generalized Face Presentation Attack Detection Open
Face presentation attack detection plays a critical role in the modern face recognition pipeline. A face presentation attack detection model with good generalization can be obtained when it is trained with face images from different input …
View article: A Joint Representation Learning and Feature Modeling Approach for One-class Recognition
A Joint Representation Learning and Feature Modeling Approach for One-class Recognition Open
One-class recognition is traditionally approached either as a representation learning problem or a feature modeling problem. In this work, we argue that both of these approaches have their own limitations; and a more effective solution can…
View article: One-Class Classification: A Survey
One-Class Classification: A Survey Open
One-Class Classification (OCC) is a special case of multi-class classification, where data observed during training is from a single positive class. The goal of OCC is to learn a representation and/or a classifier that enables recognition …
View article: Quickest Intruder Detection For Multiple User Active Authentication
Quickest Intruder Detection For Multiple User Active Authentication Open
In this paper, we investigate how to detect intruders with low latency for Active Authentication (AA) systems with multiple-users. We extend the Quickest Change Detection (QCD) framework to the multiple-user case and formulate the Multiple…
View article: Anomaly Detection-Based Unknown Face Presentation Attack Detection
Anomaly Detection-Based Unknown Face Presentation Attack Detection Open
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection (fPAD), where a spoof detector is learned using only non-attacked images of users. These detectors are of practical importance as …
View article: Open-set Adversarial Defense
Open-set Adversarial Defense Open
Open-set recognition and adversarial defense study two key aspects of deep learning that are vital for real-world deployment. The objective of open-set recognition is to identify samples from open-set classes during testing, while adversar…
View article: Federated Face Presentation Attack Detection
Federated Face Presentation Attack Detection Open
Face presentation attack detection (fPAD) plays a critical role in the modern face recognition pipeline. A face presentation attack detection model with good generalization can be obtained when it is trained with face images from different…
View article: Federated Face Anti-spoofing.
Federated Face Anti-spoofing. Open
Face presentation attack detection plays a critical role in the modern face recognition pipeline. A face anti-spoofing (FAS) model with good generalization can be obtained when it is trained with face images from different input distributi…
View article: Deep Transfer Learning for Multiple Class Novelty Detection
Deep Transfer Learning for Multiple Class Novelty Detection Open
We propose a transfer learning-based solution for the problem of multiple class novelty detection. In particular, we propose an end-to-end deep-learning based approach in which we investigate how the knowledge contained in an external, out…
View article: OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations
OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations Open
We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class. Our solution is bas…
View article: In2I: Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks
In2I: Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks Open
In unsupervised image-to-image translation, the goal is to learn the mapping between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image tran…
View article: Non-intrusive load monitoring based on low frequency active power measurements
Non-intrusive load monitoring based on low frequency active power measurements Open
A Non-Intrusive Load Monitoring (NILM) method for residential appliances based on ac-
\ntive power signal is presented. This method works e
\nectively with a single active power measurement
\ntaken at a low sampling rate (1 s). The propose…