Vibashan VS
YOU?
Author Swipe
View article: PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models Open
Vision language models (VLMs) respond to user-crafted text prompts and visual inputs, and are applied to numerous real-world problems. VLMs integrate visual modalities with large language models (LLMs), which are well known to be prompt-se…
View article: SegFace: Face Segmentation of Long-Tail Classes
SegFace: Face Segmentation of Long-Tail Classes Open
Face parsing refers to the semantic segmentation of human faces into key facial regions such as eyes, nose, hair, etc. It serves as a prerequisite for various advanced applications, including face editing, face swapping, and facial makeup,…
View article: Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Open
Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights,…
View article: FaceXBench: Evaluating Multimodal LLMs on Face Understanding
FaceXBench: Evaluating Multimodal LLMs on Face Understanding Open
Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we…
View article: SegFace: Face Segmentation of Long-Tail Classes
SegFace: Face Segmentation of Long-Tail Classes Open
Face parsing refers to the semantic segmentation of human faces into key facial regions such as eyes, nose, hair, etc. It serves as a prerequisite for various advanced applications, including face editing, face swapping, and facial makeup,…
View article: Entropic Open-Set Active Learning
Entropic Open-Set Active Learning Open
Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real…
View article: FaceXFormer: A Unified Transformer for Facial Analysis
FaceXFormer: A Unified Transformer for Facial Analysis Open
In this work, we introduce FaceXFormer, an end-to-end unified transformer model capable of performing ten facial analysis tasks within a single framework. These tasks include face parsing, landmark detection, head pose estimation, attribut…
View article: PosSAM: Panoptic Open-vocabulary Segment Anything
PosSAM: Panoptic Open-vocabulary Segment Anything Open
In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in gener…
View article: Entropic Open-set Active Learning
Entropic Open-set Active Learning Open
Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real…
View article: Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations Open
Existing instance segmentation models learn task-specific information using manual mask annotations from base (training) categories. These mask annotations require tremendous human effort, limiting the scalability to annotate novel (new) c…
View article: Open-Set Automatic Target Recognition
Open-Set Automatic Target Recognition Open
Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors. ATR algorithms are extensively used in real-world scenarios such as military and su…
View article: Towards Online Domain Adaptive Object Detection
Towards Online Domain Adaptive Object Detection Open
Existing object detection models assume both the training and test data are sampled from the same source domain. This assumption does not hold true when these detectors are deployed in real-world applications, where they encounter new visu…
View article: Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection Open
Unsupervised Domain Adaptation (UDA) is an effective approach to tackle the issue of domain shift. Specifically, UDA methods try to align the source and target representations to improve the generalization on the target domain. Further, UD…
View article: Target and Task specific Source-Free Domain Adaptive Image Segmentation
Target and Task specific Source-Free Domain Adaptive Image Segmentation Open
Solving the domain shift problem during inference is essential in medical imaging, as most deep-learning based solutions suffer from it. In practice, domain shifts are tackled by performing Unsupervised Domain Adaptation (UDA), where a mod…
View article: On-the-Fly Test-time Adaptation for Medical Image Segmentation
On-the-Fly Test-time Adaptation for Medical Image Segmentation Open
One major problem in deep learning-based solutions for medical imaging is the drop in performance when a model is tested on a data distribution different from the one that it is trained on. Adapting the source model to target data distribu…
View article: ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery
ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery Open
Representation learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery. Incorporating cognitive ability to automate the camera control enables the surgeon to concentrate more o…
View article: Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using Meta-Learning
Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using Meta-Learning Open
Object detectors trained on large-scale RGB datasets are being extensively employed in real-world applications. However, these RGB-trained models suffer a performance drop under adverse illumination and lighting conditions. Infrared (IR) c…
View article: Image Fusion Transformer
Image Fusion Transformer Open
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information. In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features…
View article: MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection Open
Existing approaches for unsupervised domain adaptive object detection perform feature alignment via adversarial training. While these methods achieve reasonable improvements in performance, they typically perform category-agnostic domain a…
View article: Unsupervised Domain Adaption of Object Detectors: A Survey
Unsupervised Domain Adaption of Object Detectors: A Survey Open
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications such as classification, segmentation, and detection. However, learning highly accurate models relies on …
View article: Unsupervised Domain Adaptation of Object Detectors: A Survey
Unsupervised Domain Adaptation of Object Detectors: A Survey Open
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications such as classification, segmentation, and detection. However, learning highly accurate models relies on …
View article: Brain Tumor Segmentation and Survival Prediction using 3D Attention UNet
Brain Tumor Segmentation and Survival Prediction using 3D Attention UNet Open
In this work, we develop an attention convolutional neural network (CNN) to segment brain tumors from Magnetic Resonance Images (MRI). Further, we predict the survival rate using various machine learning methods. We adopt a 3D UNet archite…
View article: MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain\n Adaptive Object Detection
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain\n Adaptive Object Detection Open
Existing approaches for unsupervised domain adaptive object detection perform\nfeature alignment via adversarial training. While these methods achieve\nreasonable improvements in performance, they typically perform\ncategory-agnostic domai…