Oliver Lemon
YOU?
Author Swipe
Improving Cooperation in Collaborative Embodied AI Open
The integration of Large Language Models (LLMs) into multiagent systems has opened new possibilities for collaborative reasoning and cooperation with AI agents. This paper explores different prompting methods and evaluates their effectiven…
Playpen: An Environment for Exploring Learning Through Conversational Interaction Open
Interaction between learner and feedback-giver has come into focus recently for post-training of Large Language Models (LLMs), through the use of reward models that judge the appropriateness of a model's response. In this paper, we investi…
NLP verification: towards a general methodology for certifying robustness Open
Machine learning has exhibited substantial success in the field of natural language processing (NLP). For example, large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, at t…
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests Open
We examine three evaluation paradigms: standard benchmarks (e.g., MMLU and BBH), interactive games (e.g., Signalling Games or Taboo), and cognitive tests (e.g., for working memory or theory of mind). First, we investigate which of the form…
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Open
This study explores replacing Transformers in Visual Language Models (VLMs) with Mamba, a recent structured state space model (SSM) that demonstrates promising performance in sequence modeling. We test models up to 3B parameters under cont…
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding Open
AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-Language Models (VLMs) primarily focus on third-person view videos, neglecting the richn…
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Open
An effective method for combining frozen large language models (LLM) and visual encoders involves a resampler module that creates a `visual prompt' which is provided to the LLM, along with the textual prompt. While this approach has enable…
NLP Verification: Towards a General Methodology for Certifying Robustness Open
Machine Learning (ML) has exhibited substantial success in the field of Natural Language Processing (NLP). For example large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, …
Visually Grounded Language Learning: A Review of Language Games, Datasets, Tasks, and Models Open
In recent years, several machine learning models have been proposed. They are trained with a language modelling objective on large-scale text-only data. With such pretraining, they can achieve impressive results on many Natural Language Un…
View article: RECANTFormer: Referring Expression Comprehension with Varying Numbers of Targets
RECANTFormer: Referring Expression Comprehension with Varying Numbers of Targets Open
The Generalized Referring Expression Comprehension (GREC) task extends classic REC by generating image bounding boxes for objects referred to in natural language expressions, which may indicate zero, one, or multiple targets. This generali…
Visually Grounded Language Learning: a review of language games, datasets, tasks, and models Open
In recent years, several machine learning models have been proposed. They are trained with a language modelling objective on large-scale text-only data. With such pretraining, they can achieve impressive results on many Natural Language Un…
Keynote presentation - Conversations with robots and AIs - Can foundation models support human wellbeing? Open
Drawing on examples from several research projects at the National Robotarium and Alana AI, including SPRING (social robots for elder care), RES-Q+ (spoken dialogue to support stroke patients), and RNIB (assistive visual dialogue for parti…
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion Open
Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle th…
Detecting Agreement in Multi-party Conversational AI Open
Today, conversational systems are expected to handle conversations in multi-party settings, especially within Socially Assistive Robots (SARs). However, practical usability remains difficult as there are additional challenges to overcome, …
Detecting agreement in multi-party dialogue: evaluating speaker diarisation versus a procedural baseline to enhance user engagement Open
Conversational agents participating in multi-party interactions face significant challenges in dialogue state tracking, since the identity of the speaker adds significant contextual meaning. It is common to utilise diarisation models to id…
Building for Speech: Designing the Next Generation of Social Robots for Audio Interaction Open
There have been incredible advancements in robotics and spoken dialogue systems (SDSs) over the past few years, yet we still don't find social robots in public spaces like train stations, shopping malls, or hospital waiting rooms. In this …
FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions Open
We demonstrate an embodied conversational agent that can function as a receptionist and generate a mixture of open and closed-domain dialogue along with facial expressions, by using a large language model (LLM) to develop an engaging conve…
View article: Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering Open
This paper evaluates the extent to which current Large Language Models (LLMs) can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a …
View article: SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation Open
SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has alread…
Identifying Challenges and Opportunities for Intelligent Data-Driven Health Interfaces to Support Ongoing Care Open
This workshop will explore future work in the area of intelligent, conversational, data-driven health interfaces both from patients’ and health care professionals’ perspectives. We aim to bring together a diverse set of experts and stakeho…
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion Open
Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Process…
View article: Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering Open
Angus Addlesee, Weronika Sieińska, Nancie Gunson, Daniel Hernandez Garcia, Christian Dondrup, Oliver Lemon. Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue. 2023.
FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions Open
Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson, Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan, Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, Oliver Lemon. Proceedings of the 24th Meeting of the …