Rajiv Mathews
YOU?
Author Swipe
View article: Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World
Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World Open
This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting spe…
View article: Recycling Scraps: Improving Private Learning by Leveraging Checkpoints]{Recycling Scraps: Improving Private Learning by Leveraging Checkpoints
Recycling Scraps: Improving Private Learning by Leveraging Checkpoints]{Recycling Scraps: Improving Private Learning by Leveraging Checkpoints Open
DP training pipelines for modern neural networks are iterative and generate multiple checkpoints. However, all except the final checkpoint are discarded after training. In this work, we propose novel methods to utilize intermediate checkpo…
View article: Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition
Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition Open
This work explores the challenge of enhancing Automatic Speech Recognition (ASR) model performance across various user-specific domains while preserving user data privacy. We employ federated learning and parameter-efficient domain adaptat…
View article: Learning from straggler clients in federated learning
Learning from straggler clients in federated learning Open
How well do existing federated learning algorithms learn from client devices that return model updates with a significant time delay? Is it even possible to learn effectively from clients that report back minutes, hours, or days after bein…
View article: Unintended Memorization in Large ASR Models, and How to Mitigate It
Unintended Memorization in Large ASR Models, and How to Mitigate It Open
It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challengin…
View article: Heterogeneous Federated Learning Using Knowledge Codistillation
Heterogeneous Federated Learning Using Knowledge Codistillation Open
Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model pe…
View article: The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning Open
Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the s…
View article: Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints
Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints Open
In this work, we focus on improving the accuracy-variance trade-off for state-of-the-art differentially private machine learning (DP ML) methods. First, we design a general framework that uses aggregates of intermediate checkpoints \emph{d…
View article: Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning
Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning Open
Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide p…
View article: UserLibri: A Dataset for ASR Personalization Using Only Text
UserLibri: A Dataset for ASR Personalization Using Only Text Open
Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized …
View article: Mixed Federated Learning: Joint Decentralized and Centralized Learning
Mixed Federated Learning: Joint Decentralized and Centralized Learning Open
Federated learning (FL) enables learning from decentralized privacy-sensitive data, with computations on raw data confined to take place at edge clients. This paper introduces mixed FL, which incorporates an additional loss term calculated…
View article: Online Model Compression for Federated Learning with Large Models
Online Model Compression for Federated Learning with Large Models Open
This paper addresses the challenges of training large neural network models under federated learning settings: high on-device memory usage and communication cost. The proposed Online Model Compression (OMC) provides a framework that stores…
View article: Detecting Unintended Memorization in Language-Model-Fused ASR
Detecting Unintended Memorization in Language-Model-Fused ASR Open
End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to…
View article: Extracting Targeted Training Data from ASR Models, and How to Mitigate It
Extracting Targeted Training Data from ASR Models, and How to Mitigate It Open
Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate informati…
View article: Production federated keyword spotting via distillation, filtering, and joint federated-centralized training
Production federated keyword spotting via distillation, filtering, and joint federated-centralized training Open
We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device tra…
View article: Scaling Language Model Size in Cross-Device Federated Learning
Scaling Language Model Size in Cross-Device Federated Learning Open
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train lar…
View article: Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model
Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model Open
Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase or lowercase) of noisy text. We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network mo…
View article: Scaling Language Model Size in Cross-Device Federated Learning
Scaling Language Model Size in Cross-Device Federated Learning Open
Jae Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Suresh, Shankar Kumar, Rajiv Mathews. Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022). 2022.
View article: Public Data-Assisted Mirror Descent for Private Model Training
Public Data-Assisted Mirror Descent for Private Model Training Open
In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy conc…
View article: Jointly Learning from Decentralized (Federated) and Centralized Data to\n Mitigate Distribution Shift
Jointly Learning from Decentralized (Federated) and Centralized Data to\n Mitigate Distribution Shift Open
With privacy as a motivation, Federated Learning (FL) is an increasingly used\nparadigm where learning takes place collectively on edge devices, each with a\ncache of user-generated training examples that remain resident on the local\ndevi…
View article: Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift
Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift Open
With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, each with a cache of user-generated training examples that remain resident on the local device.…
View article: Revealing and Protecting Labels in Distributed Training
Revealing and Protecting Labels in Distributed Training Open
Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the…
View article: Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network
Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network Open
Truecasing is the task of restoring the correct case (uppercase or lowercase) of noisy text generated either by an automatic system for speech recognition or machine translation or by humans. It improves the performance of downstream NLP t…
View article: A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It
A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It Open
End-to-end Automatic Speech Recognition (ASR) models are commonly trained over spoken utterances using optimization methods like Stochastic Gradient Descent (SGD). In distributed settings like Federated Learning, model training requires tr…
View article: Communication-Efficient Agnostic Federated Averaging
Communication-Efficient Agnostic Federated Averaging Open
In distributed learning settings such as federated learning, the training algorithm can be potentially biased towards different clients. Mohri et al. (2019) proposed a domain-agnostic learning algorithm, where the model is optimized for an…
View article: Understanding Unintended Memorization in Language Models Under Federated Learning
Understanding Unintended Memorization in Language Models Under Federated Learning Open
Recent works have shown that language models (LMs), e.g., for next word prediction (NWP), have a tendency to memorize rare or unique sequences in the training data. Since useful LMs are often trained on sensitive data, it is critical to id…
View article: Training Keyword Spotting Models on Non-IID Data with Federated Learning
Training Keyword Spotting Models on Non-IID Data with Federated Learning Open
We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constrai…
View article: Training Production Language Models without Memorizing User Data
Training Production Language Models without Memorizing User Data Open
This paper presents the first consumer-scale next-word prediction (NWP) model trained with Federated Learning (FL) while leveraging the Differentially Private Federated Averaging (DP-FedAvg) technique. There has been prior work on building…
View article: Understanding Unintended Memorization in Federated Learning
Understanding Unintended Memorization in Federated Learning Open
Recent works have shown that generative sequence models (e.g., language models) have a tendency to memorize rare or unique sequences in the training data. Since useful models are often trained on sensitive data, to ensure the privacy of th…
View article: Generative Models for Effective ML on Private, Decentralized Datasets
Generative Models for Effective ML on Private, Decentralized Datasets Open
To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data - of representative samples, of outliers, of misclas…