Explanipedia

MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation Open

Sonia Raychaudhuri, Enrico Cancelli, Tommaso Campari, Lamberto Ballan, Manolis Savva , et al. · 2025

Recent progress in large vision-language models has driven improvements in language-based semantic navigation, where an embodied agent must reach a target object described in natural language. Yet we still lack a clear, language-focused ev…

Survey on Modeling of Human‐made Articulated Objects Open

Jiayi Liu, Manolis Savva, Ali Mahdavi‐Amiri · 2025

Computer science Engineering

3D modeling of articulated objects is a research problem within computer vision, graphics, and robotics. Its objective is to understand the shape and motion of the articulated components, represent the geometry and mobility of object parts…

SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis Open

Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Anne Lynn S. Chang, Manolis Savva · 2025

Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics primarily assess the realism of generated scenes by comparing them to a set of ground-truth scen…

Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling Open

Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, Anne Lynn S. Chang · 2024

Computer science Physics Engineering

Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either e…

SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects Open

Jiayi Liu, Denys Iliash, Anne Lynn S. Chang, Manolis Savva, Ali Mahdavi‐Amiri · 2024

Computer science

We address the challenge of creating 3D assets for household articulated objects from a single image. Prior work on articulated object creation either requires multi-view multi-state input, or only allows coarse control over the generation…

S2O: Static to Openable Enhancement for Articulated 3D Objects Open

Denys Iliash, Hanxiao Jiang, Yiming Zhang, Manolis Savva, Anne Lynn S. Chang · 2024

Computer science

Despite much progress in large 3D datasets there are currently few interactive 3D object datasets, and their scale is limited due to the manual effort required in their construction. We introduce the static to openable (S2O) task which cre…

SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements Open

Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Anne Lynn S. Chang, Manolis Savva · 2024

Computer science

Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We…

Text‐to‐3D Shape Generation Open

Heeyoung Lee, Manolis Savva, Anne Lynn S. Chang · 2024

Computer science Political science

Recent years have seen an explosion of work and interest in text‐to‐3D shape generation. Much of the progress is driven by advances in 3D representations, large‐scale pretraining and representation learning for text and image data enabling…

Survey on Modeling of Human-made Articulated Objects Open

Jiayi Liu, Manolis Savva, Ali Mahdavi‐Amiri · 2024

Computer science

3D modeling of articulated objects is a research problem within computer vision, graphics, and robotics. Its objective is to understand the shape and motion of the articulated components, represent the geometry and mobility of object parts…

Text-to-3D Shape Generation Open

Han-Hung Lee, Manolis Savva, Anne Lynn S. Chang · 2024

Computer science

Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling…

R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding Open

Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Anne Lynn S. Chang · 2024

Computer science Art

We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objec…

Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects Open

Qirui Wu, Daniel Ritchie, Manolis Savva, Anne Lynn S. Chang · 2023

Computer science Mathematics Chemistry

Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data. Prior work that has studied this task has not focused on evaluating how realistic occlusions impact performance, and …

CAGE: Controllable Articulation GEneration Open

Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi‐Amiri, Manolis Savva · 2023

Computer science Political science

We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to sc…

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives Open

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik , et al. · 2023

Computer science Psychology Geography

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dan…

ShapeNet: An Information-Rich 3D Model Repository Open

Anne Lynn S. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang , et al. · 2023

Computer science Geography

We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a c…

LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning Open

Pedram Agand, Mohammad Mahdavian, Manolis Savva, Mo Chen · 2023

Computer science Engineering

In end-to-end autonomous driving, the utilization of existing sensor fusion techniques and navigational control methods for imitation learning proves inadequate in challenging situations that involve numerous dynamic agents. To address thi…

Advances in Data‐Driven Analysis and Synthesis of 3D Indoor Scenes Open

Akshay Gadi Patil, Supriya Gadi Patil, Manyi Li, M. Fisher, Manolis Savva , et al. · 2023

Computer science Economics Political science

This report surveys advances in deep learning‐based modelling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes…

PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects Open

Jiayi Liu, Ali Mahdavi‐Amiri, Manolis Savva · 2023

Computer science Mathematics Economics

We address the task of simultaneous part-level reconstruction and motion parameter estimation for articulated objects. Given two sets of multi-view images of an object in two static articulation states, we decouple the movable part from th…

HomeRobot: Open-Vocabulary Mobile Manipulation Open

Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna , et al. · 2023

Computer science Physics Geography

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen…

Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation Open

Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett , et al. · 2023

Computer science Geography Mathematics

We contribute the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set o…

Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance Open

Supriya Gadi Patil, Anne Lynn S. Chang, Manolis Savva · 2023

Computer science Mathematics Geography

This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a dat…

MOPA: Modular Object Navigation with PointGoal Agents Open

Sonia Raychaudhuri, Tommaso Campari, Unnat Jain, Manolis Savva, Anne Lynn S. Chang · 2023

Computer science Engineering Philosophy

We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an obj…

Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes Open

Akshay Gadi Patil, Supriya Gadi Patil, Manyi Li, M. Fisher, Manolis Savva , et al. · 2023

Computer science Biology Physics

This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes,…

OPDMulti: Openable Part Detection for Multiple Objects Open

Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang · 2023

Computer science Engineering Geography

Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a …

Emergence of Maps in the Memories of Blind Navigation Agents Open

Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos , et al. · 2023

Computer science Psychology Engineering

Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit …

Retrospectives on the Embodied AI Workshop Open

Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Anne Lynn S. Chang , et al. · 2022

Computer science Psychology

We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement,…

Habitat-Matterport 3D Semantics Dataset Open

Karmesh Yadav, Ram Ramrakhya, Santhosh Kumar Ramakrishnan, Theo Gervet, John Turner , et al. · 2022

Computer science Geography Biology

We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object …

Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges Open

Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Anne Lynn S. Chang, Manolis Savva · 2022

Computer science Geography Mathematics

Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person mani…

Manolis Savva YOU? Author Swipe