Explanipedia

Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI Open

Hangjie Shi, Leslie D. Ball, Govind Thattai, Desheng Zhang, Lucy Hu , et al. · 2023

The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As con…

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations Open

Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai , et al. · 2023

Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions,…

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation Open

Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai , et al. · 2023

Complex manipulation tasks often require robots with complementary capabilities to collaborate. We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulat…

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models Open

Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver Steeg · 2023

Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model pa…

Alexa Arena: A User-Centric Interactive Platform for Embodied AI Open

Qiaozi Gao, Govind Thattai, Gao Xiao-feng, Suhaila Shakiah, Shreyas Pansare , et al. · 2023

We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With us…

Language-Informed Transfer Learning for Embodied Household Activities Open

Yuqian Jiang, Qiaozi Gao, Govind Thattai, Gaurav S. Sukhatme · 2023

For service robots to become general-purpose in everyday household environments, they need not only a large library of primitive skills, but also the ability to quickly learn novel tasks specified by users. Fine-tuning neural networks on a…

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods Open

Da Yin, Feng Gao, Govind Thattai, Michael Johnston, Kai-Wei Chang · 2023

A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical region. In fact, a significant proportion of knowledge is locally shared by …

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models Open

Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver Steeg · 2023

Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model pa…

OpenD: A Benchmark for Language-Driven Door and Drawer Opening Open

Yizhou Zhao, Qiaozi Gao, Liang Qiu, Govind Thattai, Gaurav S. Sukhatme · 2022

We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction. To solve the task, we propose a multi-step pla…

TPA-Net: Generate A Dataset for Text to Physics-based Animation Open

Yuxing Qiu, Feng Gao, Minchen Li, Govind Thattai, Yin Yang , et al. · 2022

Recent breakthroughs in Vision-Language (V&L) joint research have achieved remarkable results in various text-driven tasks. High-quality Text-to-video (T2V), a task that has been long considered mission-impossible, was proven feasible with…

Towards Reasoning-Aware Explainable VQA Open

Rakesh Vaideeswaran, Feng Gao, Abhinav Mathur, Govind Thattai · 2022

The domain of joint vision-language understanding, especially in the context of reasoning in Visual Question Answering (VQA) models, has garnered significant attention in the recent past. While most of the existing VQA models focus on impr…

CH-MARL: A Multimodal Benchmark for Cooperative, Heterogeneous Multi-Agent Reinforcement Learning Open

Vasu Sharma, Prasoon Goyal, Kaixiang Lin, Govind Thattai, Qiaozi Gao , et al. · 2022

We propose a multimodal (vision-and-language) benchmark for cooperative and heterogeneous multi-agent learning. We introduce a benchmark multimodal dataset with tasks involving collaboration between multiple simulated heterogeneous robots …

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following Open

Xiaofeng Gao, Qiaozi Gao, Ran Gong, Kaixiang Lin, Govind Thattai , et al. · 2022

Language-guided Embodied AI benchmarks requiring an agent to navigate an environment and manipulate objects typically allow one-way communication: the human user gives a natural language command to the agent, and the agent can only follow …

A Multi-level Alignment Training Scheme for Video-and-Language Grounding Open

Yubo Zhang, Feiyang Niu, Ping Qing, Govind Thattai · 2022

To solve video-and-language grounding tasks, the key is for the network to understand the connection between the two modalities. For a pair of video and language description, their semantic relation is reflected by their encodings' similar…

Privacy Preserving Visual Question Answering Open

Cristian-Paul Bara, Ping Qing, Abhinav Mathur, Govind Thattai, Rohith MV , et al. · 2022

We introduce a novel privacy-preserving methodology for performing Visual Question Answering on the edge. Our method constructs a symbolic representation of the visual scene, using a low-complexity computer vision model that jointly predic…

Learning to Act with Affordance-Aware Multimodal Neural SLAM Open

Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai , et al. · 2022

Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied m…

Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning Open

Tongzhou Mu, Kaixiang Lin, Feiyang Niu, Govind Thattai · 2022

We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input. Unlike prior deep reinforcement learning policies paramet…

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering Open

Feng Gao, Ping Qing, Govind Thattai, Aishwarya Reganti, Ying Wu , et al. · 2022

Outside-knowledge visual question answering (OK-VQA) requires the agent to comprehend the image, make use of relevant knowledge from the entire web, and digest all the information to answer the question. Most previous works address the pro…

Best of Both Worlds: A Hybrid Approach for Multi-Hop Explanation with Declarative Facts Open

Shane Storks, Qiaozi Gao, Aishwarya Reganti, Govind Thattai · 2021

Language-enabled AI systems can answer complex, multi-hop questions to high accuracy, but supporting answers with evidence is a more challenging task which is important for the transparency and trustworthiness to users. Prior work in this …

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges Open

Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai , et al. · 2021

Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators for Embodied AI (EAI) challenges on…

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion Open

Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav S. Sukhatme · 2021

Language-guided robots performing home and office tasks must navigate in and interact with the world. Grounding language instructions against visual observations and actions to take in an environment is an open challenge. We present Embodi…

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual\n Task Completion Open

Alessandro Suglia, Qiaozi Gao, Jesse Thomason, Govind Thattai, Gaurav S. Sukhatme · 2021

Language-guided robots performing home and office tasks must navigate in and\ninteract with the world. Grounding language instructions against visual\nobservations and actions to take in an environment is an open challenge. We\npresent Emb…

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation Open

Tao Tu, Qing Ping, Govind Thattai, Gökhan Tür, Prem Natarajan · 2021

GuessWhat?! is a two-player visual dialog guessing game where player A asks a sequence of yes/no questions (Questioner) and makes a final guess (Guesser) about a target object in an image, based on answers from player B (Oracle). Based on …

Are We There Yet? Learning to Localize in Embodied Instruction Following Open

Shane Storks, Qiaozi Gao, Govind Thattai, Gökhan Tür · 2021

Embodied instruction following is a challenging problem requiring an agent to infer a sequence of primitive actions to achieve a goal environment state from complex language and visual inputs. Action Learning From Realistic Environments an…

Interactive Teaching for Conversational AI Open

Qing Ping, Feiyang Niu, Govind Thattai, Joel Chengottusseriyil, Qiaozi Gao , et al. · 2020

Current conversational AI systems aim to understand a set of pre-designed requests and execute related actions, which limits them to evolve naturally and adapt based on human interactions. Motivated by how children learn their first langua…

LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering Open

Weixin Liang, Feiyang Niu, Aishwarya Reganti, Govind Thattai, Gökhan Tür · 2020

The predominant approach to visual question answering (VQA) relies on encoding the image and question with a "black-box" neural encoder and decoding a single token as the answer like "yes" or "no". Despite this approach's strong quantitati…

Govind Thattai YOU? Author Swipe