arXiv (Cornell University)
Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion
August 2021 • Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav S. Sukhatme
Language-guided robots performing home and office tasks must navigate in and interact with the world. Grounding language instructions against visual observations and actions to take in an environment is an open challenge. We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion. Additionally, we bridge the gap between successful object-centric navigation models used for non-interactiv…