Ashish Seth
YOU?
Author Swipe
View article: Service access for youth with neurodevelopmental disabilities transitioning to adulthood: service providers’ and decision-makers’ perspectives on barriers, facilitators and policy recommendations
Service access for youth with neurodevelopmental disabilities transitioning to adulthood: service providers’ and decision-makers’ perspectives on barriers, facilitators and policy recommendations Open
Introduction Youth with Neurodevelopmental Disabilities (NDD) who are transitioning to adulthood often struggle with accessing services. This limited access can result in poorer health, reduced ability to perform daily activities and engag…
View article: One size does not fit all: a qualitative exploration of the experiences of Canadian youth with neurodevelopmental disabilities during the COVID-19 pandemic
One size does not fit all: a qualitative exploration of the experiences of Canadian youth with neurodevelopmental disabilities during the COVID-19 pandemic Open
Introduction Youth with neurodevelopmental disabilities (NDD) were disproportionately impacted by the COVID-19 pandemic due to health and socioeconomic factors and system level disruption of essential supports. To date, few studies have en…
View article: A Dynamic Traffic Flow Optimization Approach for Congestion Mitigation
A Dynamic Traffic Flow Optimization Approach for Congestion Mitigation Open
Traffic congestion can be considered one of the most significant global challenges. As the world's population grows and more vehicles are produced, traffic jams are becoming increasingly frequent, particularly during rush hours when people…
View article: Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models Open
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks. However, these models still suffer from hallucinations, particularly when required to implicitly recognize or infer diverse visual e…
View article: MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark Open
The ability to comprehend audio--which includes speech, non-speech sounds, and music--is crucial for AI agents to interact effectively with the world. We present MMAU, a novel benchmark designed to evaluate multimodal audio understanding m…
View article: EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning Open
In this paper, we present EH-MAM (Easy-to-Hard adaptive Masked Acoustic Modeling), a novel self-supervised learning approach for speech representation learning. In contrast to the prior methods that use random masking schemes for Masked Ac…
View article: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Open
Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) wi…
View article: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition Open
Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveragin…
View article: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models Open
A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved perfo…
View article: Applying Clustering to Predict Attackers Trace in Deceptive Ecosystem by Harmonizing Multiple Decoys Interactions Logs
Applying Clustering to Predict Attackers Trace in Deceptive Ecosystem by Harmonizing Multiple Decoys Interactions Logs Open
Bluff and truth are major pillars of deception technology. Deception technology majorly relies on decoy-generated data and looks for any behavior deviation to flag that interaction as an attack or not. But at times a legitimate user can al…
View article: DeAR: Debiasing Vision-Language Models with Additive Residuals
DeAR: Debiasing Vision-Language Models with Additive Residuals Open
Large pre-trained vision-language models (VLMs) reduce the time for developing predictive models for various vision-grounded language downstream tasks by providing rich, adaptable image and text representations. However, these models suffe…
View article: UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation Open
In this paper, we introduce UnFuSeD, a novel approach to leverage self-supervised learning and reduce the need for large amounts of labeled data for audio classification. Unlike prior works, which directly fine-tune a self-supervised pre-t…
View article: A novel security framework for threat management of cloud based computing networks
A novel security framework for threat management of cloud based computing networks Open
There are many different ways to store data on the cloud, the most common of which are logical pools. Although there are numerous advantages to cloud storage, such as scalability, usability, and cost savings the biggest danger to cloud sto…
View article: MAST: Multiscale Audio Spectrogram Transformers
MAST: Multiscale Audio Spectrogram Transformers Open
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST). Given an input audio spectrogram, we first patchify…
View article: SLICER: Learning universal audio representations using low-resource self-supervised pre-training
SLICER: Learning universal audio representations using low-resource self-supervised pre-training Open
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification. Our primary aim is to learn audio represent…
View article: Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages
Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages Open
Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target langu…
View article: Mental health challenges during COVID-19: perspectives from parents with children with neurodevelopmental disabilities
Mental health challenges during COVID-19: perspectives from parents with children with neurodevelopmental disabilities Open
Emergency preparedness planning requires a disability inclusive approach allocating resources for family supports in the home and community. Families identified supports to minimize further pandemic disruptions and enhance recovery.
View article: Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition Open
Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is tha…
View article: DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning
DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning Open
Inspired by the recent progress in self-supervised learning for computer vision, in this paper we introduce DeLoRes, a new general-purpose audio representation learning approach. Our main objective is to make our network learn representati…
View article: DECAR: Deep Clustering for learning general-purpose Audio Representations
DECAR: Deep Clustering for learning general-purpose Audio Representations Open
We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations. Our system is based on clustering: it utilizes an offline clustering step to provide target labels that act as pseudo-labels fo…
View article: The Novel Multi-Layered Approach to Enhance the Sorting Performance of Healthcare Analysis
The Novel Multi-Layered Approach to Enhance the Sorting Performance of Healthcare Analysis Open
Emergence of big data in today’s world leads to new challenges for sorting strategies to analyze the data in a better way. For most of the analyzing technique, sorting is considered as an implicit attribute of the technique used. The avail…
View article: Dual Script E2E Framework for Multilingual and Code-Switching ASR
Dual Script E2E Framework for Multilingual and Code-Switching ASR Open
India is home to multiple languages, and training automatic speech recognition (ASR) systems for languages is challenging. Over time, each language has adopted words from other languages, such as English, leading to code-mixing. Most India…