Explanipedia

Towards Foundational Models for Single-Chip Radar Open

Tianshu Huang, Akarsh Prabhakara, Chuhan Chen, Jay Karhade, Deva Ramanan , et al. · 2025

mmWave radars are compact, inexpensive, and durable sensors that are robust to occlusions and work regardless of environmental conditions, such as weather and darkness. However, this comes at the cost of poor angular resolution, especially…

Label Uncertainty for Ultrasound Segmentation Open

Malini Shivaram, Gautam Rajendrakumar Gare, Laura Hutchins, Jane Duplantis, Thomas Deiss , et al. · 2025

In medical imaging, inter-observer variability among radiologists often introduces label uncertainty, particularly in modalities where visual interpretation is subjective. Lung ultrasound (LUS) is a prime example-it frequently presents a m…

MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion Open

Zihan Wang, Jeff Tan, Tarasha Khurana, Neehar Peri, Deva Ramanan · 2025

We address the problem of dynamic scene reconstruction from sparse-view videos. Prior work often requires dense multi-view captures with hundreds of calibrated cameras (e.g. Panoptic Studio). Such multi-view setups are prohibitively expens…

Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos Open

Kaihua Chen, Tarasha Khurana, Deva Ramanan · 2025

We explore novel-view synthesis for dynamic scenes from monocular videos. Prior approaches rely on costly test-time optimization of 4D representations or do not preserve scene geometry when trained in a feed-forward manner. Our approach is…

Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models Open

Peter Robicheaux, M. V. Popov, Anish Madan, Isaac Robinson, Joseph B. Nelson , et al. · 2025

Vision-language models (VLMs) trained on internet-scale data achieve remarkable zero-shot detection performance on common objects like car, truck, and pedestrian. However, state-of-the-art models still struggle to generalize to out-of-dist…

DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion Open

Qitao Zhao, Amy Lin, Jeff Tan, Jason Zhang, Deva Ramanan , et al. · 2025

Current Structure-from-Motion (SfM) methods typically follow a two-stage pipeline, combining learned or geometric pairwise reasoning with a subsequent global optimization step. In contrast, we propose a data-driven multi-view reasoning app…

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis Open

Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa G. Narasimhan, Shubham Tulsiani · 2025

We explore the task of geometric reconstruction of images captured from a mixture of ground and aerial views. Current state-of-the-art learning-based approaches fail to handle the extreme viewpoint variation between aerial-ground image pai…

Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization Open

Kangle Deng, Hsueh‐Ti Derek Liu, Yong‐Guan Zhu, Xiaoxia Sun, Chong Shang , et al. · 2025

Many 3D generative models rely on variational autoencoders (VAEs) to learn compact shape representations. However, existing methods encode all shapes into a fixed-size token, disregarding the inherent variations in scale and complexity acr…

Using Diffusion Priors for Video Amodal Segmentation Open

Kaihua Chen, Deva Ramanan, Tarasha Khurana · 2024

Object permanence in humans is a fundamental cue that helps in understanding persistence of objects, even when they are fully occluded in the scene. Present day methods in object segmentation do not account for this amodal nature of the wo…

LEARNER: Contrastive Pretraining for Learning Fine-Grained Patient Progression from Coarse Inter-Patient Labels Open

Gautam Rajendrakumar Gare, Jana Armouti, Nikhil Madaan, Rohan Panda, Tom Fox , et al. · 2024

Predicting whether a treatment leads to meaningful improvement is a central challenge in personalized medicine, particularly when disease progression manifests as subtle visual changes over time. While data-driven deep learning (DL) offers…

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Open

Baiqi Li, Zhiqiu Lin, Wenxuan Peng, Jean de Dieu Nyandwi, Daniel Jiang , et al. · 2024

Vision-language models (VLMs) have made significant progress in recent visual-question-answering (VQA) benchmarks that evaluate complex visio-linguistic reasoning. However, are these models truly effective? In this work, we show that VLMs …

Neural Eulerian Scene Flow Fields Open

Kyle Vedder, Neehar Peri, Ishan Khatri, Siyi Li, Eric Eaton , et al. · 2024

We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior. Our method, EulerFlow, optimizes this neural prior estimate against seve…

Lidar Panoptic Segmentation in an Open World Open

Anirudh Chakravarthy, Meghana Reddy Ganesina, Peiyun Hu, Laura Leal-Taixé, Shu Kong , et al. · 2024

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation Open

Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei , et al. · 2024

While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct…

SMORE: Simultaneous Map and Object REconstruction Open

Nathaniel Chodosh, Anish Madan, Deva Ramanan, Simon Lucey · 2024

We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR. Depth-based reconstructions tend to focus on small-scale objects or large-scale SLAM reconstructions that treat moving objects as outliers. We t…

Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection Open

Mehar Khurana, Neehar Peri, Deva Ramanan, James Hays · 2024

State-of-the-art 3D object detectors are often trained on massive labeled datasets. However, annotating 3D bounding boxes remains prohibitively expensive and time-consuming, particularly for LiDAR. Instead, recent works demonstrate that se…

Reanimating Images using Neural Representations of Dynamic Stimuli Open

Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan , et al. · 2024

While computer vision models have made incredible strides in static image recognition, they still do not match human performance in tasks that require the understanding of complex, dynamic motion. This is notably true for real-world scenar…

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection Open

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick , et al. · 2024

This repository contains all the collected and aligned data for RU-AI dataset. It is constructed based on three large publicly available datasets: Flickr8K, COCO, and Places205, by adding their corresponding machine-generated pairs from fi…

Predicting Long-horizon Futures by Conditioning on Geometry and Time Open

Tarasha Khurana, Deva Ramanan · 2024

Our work explores the task of generating future sensor observations conditioned on the past. We are motivated by `predictive coding' concepts from neuroscience as well as robotic applications such as self-driving vehicles. Predictive video…

Evaluating Text-to-Visual Generation with Image-to-Text Generation Open

Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia , et al. · 2024

Despite significant progress in generative AI, comprehensive evaluation remains challenging because of the lack of effective metrics and standardized benchmarks. For instance, the widely-used CLIPScore measures the alignment between a (gen…

Better Call SAL: Towards Learning to Segment Anything in Lidar Open

Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan , et al. · 2024

We propose the SAL (Segment Anything in Lidar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervisi…

I Can't Believe It's Not Scene Flow! Open

Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Hays · 2024

Current scene flow methods broadly fail to describe motion on small objects, and current scene flow evaluation protocols hide this failure by averaging over many points, with most drawn larger objects. To fix this evaluation failure, we pr…

Cameras as Rays: Pose Estimation via Ray Diffusion Open

Jason Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan , et al. · 2024

Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrins…

FlashTex: Fast Relightable Mesh Texturing with LightControlNet Open

Kangle Deng, Timothy Omernick, Alexander Weiß, Deva Ramanan, Jun-Yan Zhu , et al. · 2024

Manually creating textures for 3D meshes is time-consuming, even for expert visual content creators. We propose a fast approach for automatically texturing an input 3D mesh based on a user-provided text prompt. Importantly, our approach di…

Improving Model's Interpretability and Reliability using Biomarkers Open

Gautam Rajendrakumar Gare, Tom Fox, Beam Chansangavej, Amita Krishnan, Ricardo Luis Rodriguez , et al. · 2024

Accurate and interpretable diagnostic models are crucial in the safety-critical field of medicine. We investigate the interpretability of our proposed biomarker-based lung ultrasound diagnostic pipeline to enhance clinicians' diagnostic ca…

The Neglected Tails in Vision-Language Models Open

Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li , et al. · 2024

Vision-language models (VLMs) excel in zero-shot recognition but their performance varies greatly across different visual concepts. For example, although CLIP achieves impressive accuracy on ImageNet (60-80%), its performance drops below 1…

Fast and Modular Autonomy Software for Autonomous Racing Vehicles Open

Andrew Saba, Aderotimi Adetunji, A. Peter Johnson, Aadi Kothari, Matthew Sivaprakasam , et al. · 2024

Autonomous motorsports aim to replicate the human racecar driver with\nsoftware and sensors. As in traditional motorsports, Autonomous Racing Vehicles\n(ARVs) are pushed to their handling limits in multi-agent scenarios at\nextremely high …

Revisiting Few-Shot Object Detection with Vision-Language Models Open

Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan · 2023

The era of vision-language models (VLMs) trained on web-scale datasets challenges conventional formulations of "open-world" perception. In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundati…

TAO-Amodal: A Benchmark for Tracking Any Object Amodally Open

Cheng-Yen Hsieh, Tarasha Khurana, Achal Dave, Deva Ramanan · 2023

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of hea…

Long-Tailed 3D Detection via Multi-Modal Fusion Open

Yechi Ma, Neehar Peri, Shuoquan Wei, Wei Hua, Deva Ramanan , et al. · 2023

Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors. While class labels naturally follow a long-tailed distribution in the real world, existing benchmarks only focus on a few common classes (e…

Deva Ramanan YOU? Author Swipe