Explanipedia

PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models Open

Jenny Schmalfuß, Nadine Chang, Vibashan VS, Maying Shen, Andrés Bruhn , et al. · 2025

Vision language models (VLMs) respond to user-crafted text prompts and visual inputs, and are applied to numerous real-world problems. VLMs integrate visual modalities with large language models (LLMs), which are well known to be prompt-se…

SegFace: Face Segmentation of Long-Tail Classes Open

Kartik Narayan, Vibashan VS, Vishal M. Patel · 2025

Face parsing refers to the semantic segmentation of human faces into key facial regions such as eyes, nose, hair, etc. It serves as a prerequisite for various advanced applications, including face editing, face swapping, and facial makeup,…

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Open

Zhiqi Li, Chen Guo, Shilong Liu, Shihao Wang, Vibashan VS , et al. · 2025

Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights,…

FaceXBench: Evaluating Multimodal LLMs on Face Understanding Open

Kartik Narayan, Vibashan VS, Vishal M. Patel · 2025

Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we…

SegFace: Face Segmentation of Long-Tail Classes Open

Kartik Narayan, Vibashan VS, Vishal M. Patel · 2024

Face parsing refers to the semantic segmentation of human faces into key facial regions such as eyes, nose, hair, etc. It serves as a prerequisite for various advanced applications, including face editing, face swapping, and facial makeup,…

Entropic Open-Set Active Learning Open

Bardia Safaei, Vibashan VS, Celso M. de Melo, Vishal M. Patel · 2024

Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real…

FaceXFormer: A Unified Transformer for Facial Analysis Open

Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel · 2024

In this work, we introduce FaceXFormer, an end-to-end unified transformer model capable of performing ten facial analysis tasks within a single framework. These tasks include face parsing, landmark detection, head pose estimation, attribut…

PosSAM: Panoptic Open-vocabulary Segment Anything Open

Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal M. Patel , et al. · 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in gener…

Entropic Open-set Active Learning Open

Bardia Safaei, Vibashan VS, Celso M. de Melo, Vishal M. Patel · 2023

Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real…

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations Open

Vibashan VS, Ning Yu, Xing Chen, Can Qin, Mingfei Gao , et al. · 2023

Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) c…

Open-Set Automatic Target Recognition Open

Bardia Safaei, Vibashan VS, Celso M. de Melo, Shuowen Hu, Vishal M. Patel · 2022

Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors. ATR algorithms are extensively used in real-world scenarios such as military and su…

Towards Online Domain Adaptive Object Detection Open

Vibashan VS, Poojan Oza, Vishal M. Patel · 2022

Existing object detection models assume both the training and test data are sampled from the same source domain. This assumption does not hold true when these detectors are deployed in real-world applications, where they encounter new visu…

Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection Open

Vibashan VS, Poojan Oza, Vishal M. Patel · 2022

Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift. Specifically, UDA methods try to align the source and target representations to improve the generalization on the target domain. Further, UD…

Target and Task specific Source-Free Domain Adaptive Image Segmentation Open

Vibashan VS, Jeya Maria Jose Valanarasu, Vishal M. Patel · 2022

Solving the domain shift problem during inference is essential in medical imaging, as most deep-learning based solutions suffer from it. In practice, domain shifts are tackled by performing Unsupervised Domain Adaptation (UDA), where a mod…

On-the-Fly Test-time Adaptation for Medical Image Segmentation Open

Jeya Maria Jose Valanarasu, Vibashan VS, Vishal M. Patel · 2022

One major problem in deep learning-based solutions for medical imaging is the drop in performance when a model is tested on a data distribution different from the one that it is trained on. Adapting the source model to target data distribu…

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery Open

Mobarakol Islam, Vibashan VS, Chwee Ming Lim, Hongliang Ren · 2021

Representation learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery. Incorporating cognitive ability to automate the camera control enables the surgeon to concentrate more o…

Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using Meta-Learning Open

Vibashan VS, Domenick Poster, Suya You, Shuowen Hu, Vishal M. Patel · 2021

Object detectors trained on large-scale RGB datasets are being extensively employed in real-world applications. However, these RGB-trained models suffer a performance drop under adverse illumination and lighting conditions. Infrared (IR) c…

Image Fusion Transformer Open

Vibashan VS, Jeya Maria Jose Valanarasu, Poojan Oza, Vishal M. Patel · 2021

In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information. In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features…

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection Open

Vibashan VS, Vikram Gupta, Poojan Oza, Vishwanath A. Sindagi, Vishal M. Patel · 2021

Existing approaches for unsupervised domain adaptive object detection perform feature alignment via adversarial training. While these methods achieve reasonable improvements in performance, they typically perform category-agnostic domain a…

Unsupervised Domain Adaption of Object Detectors: A Survey Open

Poojan Oza, Vishwanath A. Sindagi, Vibashan VS, Vishal M. Patel · 2021

Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications such as classification, segmentation, and detection. However, learning highly accurate models relies on …

Unsupervised Domain Adaptation of Object Detectors: A Survey Open

Poojan Oza, Vishwanath A. Sindagi, Vibashan VS, Vishal M. Patel · 2021

Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications such as classification, segmentation, and detection. However, learning highly accurate models relies on …

Brain Tumor Segmentation and Survival Prediction using 3D Attention UNet Open

Mobarakol Islam, Vibashan VS, V. Jeya Maria Jose, Navodini Wijethilake, Utkarsh Uppal , et al. · 2021

In this work, we develop an attention convolutional neural network (CNN) to segment brain tumors from Magnetic Resonance Images (MRI). Further, we predict the survival rate using various machine learning methods. We adopt a 3D UNet archite…

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain\n Adaptive Object Detection Open

Vibashan VS, Vikram Gupta, Poojan Oza, Vishwanath A. Sindagi, Vishal M. Patel · 2021

Existing approaches for unsupervised domain adaptive object detection perform\nfeature alignment via adversarial training. While these methods achieve\nreasonable improvements in performance, they typically perform\ncategory-agnostic domai…

Vibashan VS YOU? Author Swipe