Exploring foci of:
arXiv (Cornell University)
From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models
September 2024 • Tomi Pulli, Stefan Thalhammer, Simon Schwaiger, Markus Vincze
Robots are increasingly envisioned to interact in real-world scenarios, where they must continuously adapt to new situations. To detect and grasp novel objects, zero-shot pose estimators determine poses without prior knowledge. Recently, vision language models (VLMs) have shown considerable advances in robotics applications by establishing an understanding between language input and image input. In our work, we take advantage of VLMs zero-shot capabilities and translate this ability to 6D object pose estimation. W…
General-Purpose Machine Gun
Artificial Intelligence
Computer Science
Computer Vision
Estimation
Engineering
Systems Engineering