Manolis Savva
YOU?
Author Swipe
View article: MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation
MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation Open
Recent progress in large vision-language models has driven improvements in language-based semantic navigation, where an embodied agent must reach a target object described in natural language. Yet we still lack a clear, language-focused ev…
View article: Survey on Modeling of Human‐made Articulated Objects
Survey on Modeling of Human‐made Articulated Objects Open
3D modeling of articulated objects is a research problem within computer vision, graphics, and robotics. Its objective is to understand the shape and motion of the articulated components, represent the geometry and mobility of object parts…
View article: SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis Open
Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics primarily assess the realism of generated scenes by comparing them to a set of ground-truth scen…
View article: Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling
Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling Open
Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either e…
View article: SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects Open
We address the challenge of creating 3D assets for household articulated objects from a single image. Prior work on articulated object creation either requires multi-view multi-state input, or only allows coarse control over the generation…
View article: S2O: Static to Openable Enhancement for Articulated 3D Objects
S2O: Static to Openable Enhancement for Articulated 3D Objects Open
Despite much progress in large 3D datasets there are currently few interactive 3D object datasets, and their scale is limited due to the manual effort required in their construction. We introduce the static to openable (S2O) task which cre…
View article: SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements
SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements Open
Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We…
View article: Text‐to‐3D Shape Generation
Text‐to‐3D Shape Generation Open
Recent years have seen an explosion of work and interest in text‐to‐3D shape generation. Much of the progress is driven by advances in 3D representations, large‐scale pretraining and representation learning for text and image data enabling…
View article: Survey on Modeling of Human-made Articulated Objects
Survey on Modeling of Human-made Articulated Objects Open
3D modeling of articulated objects is a research problem within computer vision, graphics, and robotics. Its objective is to understand the shape and motion of the articulated components, represent the geometry and mobility of object parts…
View article: Text-to-3D Shape Generation
Text-to-3D Shape Generation Open
Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling…
View article: R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding
R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding Open
We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objec…
View article: Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects
Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects Open
Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data. Prior work that has studied this task has not focused on evaluating how realistic occlusions impact performance, and …
View article: CAGE: Controllable Articulation GEneration
CAGE: Controllable Articulation GEneration Open
We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to sc…
View article: Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives Open
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dan…
View article: ShapeNet: An Information-Rich 3D Model Repository
ShapeNet: An Information-Rich 3D Model Repository Open
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a c…
View article: LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning
LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning Open
In end-to-end autonomous driving, the utilization of existing sensor fusion techniques and navigational control methods for imitation learning proves inadequate in challenging situations that involve numerous dynamic agents. To address thi…
View article: Advances in Data‐Driven Analysis and Synthesis of 3D Indoor Scenes
Advances in Data‐Driven Analysis and Synthesis of 3D Indoor Scenes Open
This report surveys advances in deep learning‐based modelling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes…
View article: PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects
PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects Open
We address the task of simultaneous part-level reconstruction and motion parameter estimation for articulated objects. Given two sets of multi-view images of an object in two static articulation states, we decouple the movable part from th…
View article: HomeRobot: Open-Vocabulary Mobile Manipulation
HomeRobot: Open-Vocabulary Mobile Manipulation Open
HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen…
View article: Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation Open
We contribute the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set o…
View article: Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance
Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance Open
This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a dat…
View article: MOPA: Modular Object Navigation with PointGoal Agents
MOPA: Modular Object Navigation with PointGoal Agents Open
We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an obj…
View article: Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes
Advances in Data-Driven Analysis and Synthesis of 3D Indoor Scenes Open
This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes,…
View article: OPDMulti: Openable Part Detection for Multiple Objects
OPDMulti: Openable Part Detection for Multiple Objects Open
Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a …
View article: Emergence of Maps in the Memories of Blind Navigation Agents
Emergence of Maps in the Memories of Blind Navigation Agents Open
Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit …
View article: Retrospectives on the Embodied AI Workshop
Retrospectives on the Embodied AI Workshop Open
We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement,…
View article: Habitat-Matterport 3D Semantics Dataset
Habitat-Matterport 3D Semantics Dataset Open
We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object …
View article: Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges
Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges Open
Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person mani…