From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models

Exploring foci of: arXiv (Cornell University) From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models September 2024 • Tomi Pulli, Stefan Thalhammer, Simon Schwaiger, Markus Vincze Robots are increasingly envisioned to interact in real-world scenarios, where they must continuously adapt to new situations. To detect and grasp novel objects, zero-shot pose estimators determine poses without prior knowledge. Recently, vision language models (VLMs) have shown considerable advances in robotics applications by establishing an understanding between language input and image input. In our work, we take advantage of VLMs zero-shot capabilities and translate this ability to 6D object pose estimation. W… Open Article Page

General-Purpose Machine Gun Artificial Intelligence Computer Science Computer Vision Estimation Engineering Systems Engineering Open Article