Anjan Dutta
YOU?
Author Swipe
View article: Copy‐Paste Augmentation Improves Automatic Species Identification in Camera Trap Images
Copy‐Paste Augmentation Improves Automatic Species Identification in Camera Trap Images Open
Effective conservation requires effective biodiversity monitoring. The pace of global biodiversity change far outstrips the ability of manual fieldwork to monitor it. Therefore, technological solutions, like camera traps, have emerged as a…
View article: DreamPet: Text Driven Controllable 3D Animal Generation using Gaussian Splatting
DreamPet: Text Driven Controllable 3D Animal Generation using Gaussian Splatting Open
Realistic 3D animal generation from text prompts is a significant yet challenging task. Traditional approaches, which use score distillation sampling to optimize 3D formats like meshes or neural fields, often suffer from a lack of detail a…
View article: A Closer Look at Multimodal Representation Collapse
A Closer Look at Multimodal Representation Collapse Open
We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that moda…
View article: Skin Disease Detector using CNN
Skin Disease Detector using CNN Open
Skin conditions are common and frequently need for a quick and precise diagnosis in order to be treated. In this paper, we propose a Convolutional Neural Network (CNN) based skin disease detection system. CNNs are ideally suited for the id…
View article: OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors Open
Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies co…
View article: Copy-paste augmentation improves automatic species identification in camera trap images
Copy-paste augmentation improves automatic species identification in camera trap images Open
1. Effective conservation requires effective biodiversity monitoring. The pace of global biodiversity change far outstrips the ability of manual fieldwork to monitor it. Therefore, technological solutions, like camera traps, have emerged a…
View article: CraftSVG: Multi-Object Text-to-SVG Synthesis via Layout Guided Diffusion
CraftSVG: Multi-Object Text-to-SVG Synthesis via Layout Guided Diffusion Open
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects,…
View article: OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors Open
Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies co…
View article: Learning Conditional Invariances through Non-Commutativity
Learning Conditional Invariances through Non-Commutativity Open
Invariance learning algorithms that conditionally filter out domain-specific random variables as distractors, do so based only on the data semantics, and not the target domain under evaluation. We show that a provably optimal and sample-ef…
View article: CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives
CLIPDraw++: Text-to-Sketch Synthesis with Simple Primitives Open
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like straight lines…
View article: Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships
Transitivity Recovering Decompositions: Interpretable and Robust Fine-Grained Relationships Open
Recent advances in fine-grained representation learning leverage local-to-global (emergent) relationships for achieving state-of-the-art results. The relational representations relied upon by such methods, however, are abstract. We aim to …
View article: In-shop Clothes Retrieval
In-shop Clothes Retrieval Open
In-shop Clothes Retrieval
View article: In-shop Clothes Retrieval
In-shop Clothes Retrieval Open
In-shop Clothes Retrieval
View article: Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Actor-agnostic Multi-label Action Recognition with Multi-modal Query Open
Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome mod…
View article: Animal Kingdom AR
Animal Kingdom AR Open
Animal Kingdom
View article: Animal Kingdom AR
Animal Kingdom AR Open
Animal Kingdom
View article: Torque reversal and cyclotron absorption feature in HMXB 4U 1538-522
Torque reversal and cyclotron absorption feature in HMXB 4U 1538-522 Open
We present a comprehensive timing and spectral analysis of the HMXB 4U 1538-522 by using the Nuclear Spectroscopic Telescope Array (NuSTAR) observatory data. Using three archived observations made between 2019 and 2021, we have detected $\…
View article: Data-Free Sketch-Based Image Retrieval
Data-Free Sketch-Based Image Retrieval Open
Rising concerns about privacy and anonymity preservation of deep learning models have facilitated research in data-free learning (DFL). For the first time, we identify that for data-scarce tasks like Sketch-Based Image Retrieval (SBIR), wh…
View article: Places2_simp
Places2_simp Open
Simplified version of Places2 dataset.
View article: Places2_simp
Places2_simp Open
Simplified version of Places2 dataset.
View article: HMDB_simp
HMDB_simp Open
SImplified version of HMDB51 dataset
View article: HMDB_simp
HMDB_simp Open
SImplified version of HMDB51 dataset
View article: Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval
Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval Open
Representation learning for sketch-based image retrieval has mostly been tackled by learning embeddings that discard modality-specific information. As instances from different modalities can often provide complementary information describi…
View article: Relational Proxies: Emergent Relationships as Fine-Grained Discriminators
Relational Proxies: Emergent Relationships as Fine-Grained Discriminators Open
Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Rel…
View article: Abstracting Sketches through Simple Primitives
Abstracting Sketches through Simple Primitives Open
Humans show high-level of abstraction capabilities in games that require quickly communicating object information. They decompose the message content into multiple parts and communicate them in an interpretable protocol. Toward equipping m…
View article: BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR
BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR Open
The efficacy of zero-shot sketch-based image retrieval (ZS-SBIR) models is governed by two challenges. The immense distributions-gap between the sketches and the images requires a proper domain alignment. Moreover, the fine-grained nature …
View article: Abstracting Sketches Through Simple Primitives
Abstracting Sketches Through Simple Primitives Open
View article: Concurrent Discrimination and Alignment for Self-Supervised Feature Learning
Concurrent Discrimination and Alignment for Self-Supervised Feature Learning Open
Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features …
View article: Concurrent Discrimination and Alignment for Self-Supervised Feature\n Learning
Concurrent Discrimination and Alignment for Self-Supervised Feature\n Learning Open
Existing self-supervised learning methods learn representation by means of\npretext tasks which are either (1) discriminating that explicitly specify which\nfeatures should be separated or (2) aligning that precisely indicate which\nfeatur…
View article: Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning
Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning Open
Zero-Shot Learning (ZSL) aims to recognise unseen object classes, which are not observed during the training phase. The existing body of works on ZSL mostly relies on pretrained visual features and lacks the explicit attribute localisation…