Mark Gales
YOU?
Author Swipe
View article: Data Augmentation for Spoken Grammatical Error Correction
Data Augmentation for Spoken Grammatical Error Correction Open
While there exist strong benchmark datasets for grammatical error correction (GEC), high-quality annotated spoken datasets for Spoken GEC (SGEC) are still under-resourced. In this paper, we propose a fully automated method to generate audi…
View article: Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs Open
Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a …
View article: Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction
Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction Open
Spoken Grammatical Error Correction (SGEC) and Feedback (SGECF) are crucial for second language learners, teachers and test takers. Traditional SGEC systems rely on a cascaded pipeline consisting of an ASR, a module for disfluency detectio…
View article: Assessment of L2 Oral Proficiency using Speech Large Language Models
Assessment of L2 Oral Proficiency using Speech Large Language Models Open
The growing population of L2 English speakers has increased the demand for developing automatic graders for spoken language assessment (SLA). Historically, statistical models, text encoders, and self-supervised speech models have been util…
View article: Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge Open
This paper explores generalised probabilistic modelling and uncertainty estimation in comparative LLM-as-a-judge frameworks. We show that existing Product-of-Experts methods are specific cases of a broader framework, enabling diverse model…
View article: Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs
Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs Open
The combination of pre-trained speech encoders with large language models has enabled the development of speech LLMs that can handle a wide range of spoken language processing tasks. While these models are powerful and flexible, this very …
View article: Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?
Unlearning vs. Obfuscation: Are We Truly Removing Knowledge? Open
Unlearning has emerged as a critical capability for large language models (LLMs) to support data privacy, regulatory compliance, and ethical AI deployment. Recent techniques often rely on obfuscation by injecting incorrect or irrelevant in…
View article: Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness Open
There is a growing abundance of publicly available or company-owned audio/video archives, highlighting the increasing importance of efficient access to desired content and information retrieval from these archives. This paper investigates …
View article: Beyond COVID-19, the case for collecting, analysing and using sex-disaggregated data and gendered data to inform outbreak response: a scoping review
Beyond COVID-19, the case for collecting, analysing and using sex-disaggregated data and gendered data to inform outbreak response: a scoping review Open
Introduction Understanding sex and gender differences during outbreaks is critical to delivering an effective response. Although recommendations and minimum requirements exist, the incorporation of sex-disaggregated data and gender analysi…
View article: Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback
Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback Open
We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of th…
View article: Speak & Improve Challenge 2025: Tasks and Baseline Systems
Speak & Improve Challenge 2025: Tasks and Baseline Systems Open
This paper presents the "Speak & Improve Challenge 2025: Spoken Language Assessment and Feedback" -- a challenge associated with the ISCA SLaTE 2025 Workshop. The goal of the challenge is to advance research on spoken language assessment a…
View article: Zero-Shot Audio Topic Reranking Using Large Language Models
Zero-Shot Audio Topic Reranking Using Large Language Models Open
Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content,…
View article: A recurrent neural network and parallel hidden Markov model algorithm to segment and detect heart murmurs in phonocardiograms
A recurrent neural network and parallel hidden Markov model algorithm to segment and detect heart murmurs in phonocardiograms Open
The detection of heart disease using a stethoscope requires significant skill and time, making it expensive and impractical for widespread screening in low-resource environments. Machine learning analysis of heart sound recordings can impr…
View article: Structural-based uncertainty in deep learning across anatomical scales: Analysis in white matter lesion segmentation
Structural-based uncertainty in deep learning across anatomical scales: Analysis in white matter lesion segmentation Open
This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multip…
View article: Can GPT-4 do L2 analytic assessment?
Can GPT-4 do L2 analytic assessment? Open
Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed …
View article: SkillAggregation: Reference-free LLM-Dependent Aggregation
SkillAggregation: Reference-free LLM-Dependent Aggregation Open
Large Language Models (LLMs) are increasingly used to assess NLP tasks due to their ability to generate human-like judgments. Single LLMs were used initially, however, recent work suggests using multiple LLMs as judges yields improved perf…
View article: Finetuning LLMs for Comparative Assessment Tasks
Finetuning LLMs for Comparative Assessment Tasks Open
Automated assessment in natural language generation is a challenging task. Instruction-tuned large language models (LLMs) have shown promise in reference-free evaluation, particularly through comparative assessment. However, the quadratic …
View article: ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models Open
Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions, enhancing the readability and quality of transcriptions. Without requiring access to the underlying code or model weights, EC c…
View article: Grammatical Error Feedback: An Implicit Evaluation Approach
Grammatical Error Feedback: An Implicit Evaluation Approach Open
Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more ho…
View article: Multi‐modal video search by examples—A video quality impact analysis
Multi‐modal video search by examples—A video quality impact analysis Open
As the proliferation of video content continues, and many video archives lack suitable metadata, therefore, video retrieval, particularly through example‐based search, has become increasingly crucial. Existing metadata often fails to meet …
View article: Learn and Don't Forget: Adding a New Language to ASR Foundation Models
Learn and Don't Forget: Adding a New Language to ASR Foundation Models Open
Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language …
View article: Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models Open
Speech enabled foundation models, either in the form of flexible speech recognition based systems or audio-prompted large language models (LLMs), are becoming increasingly popular. One of the interesting aspects of these models is their ab…
View article: Cross-Lingual Transfer Learning for Speech Translation
Cross-Lingual Transfer Learning for Speech Translation Open
There has been increasing interest in building multilingual foundation models for NLP and speech research. This paper examines how to expand the speech translation capability of these models with restricted data. Whisper, a speech foundati…
View article: MVRMLM 2024: Multimodal Video Retrieval and Multimodal Language Modelling
MVRMLM 2024: Multimodal Video Retrieval and Multimodal Language Modelling Open
As the proliferation of video content continues, and many video archives lack suitable metadata, therefore, video retrieval, particularly through example-based search, has become increasingly crucial. Existing metadata often fails to meet …
View article: CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models Open
Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information. Given the diversity in architectures, training data and instruction tuning techniques,…
View article: Question-Based Retrieval using Atomic Units for Enterprise RAG
Question-Based Retrieval using Atomic Units for Enterprise RAG Open
Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant ch…