Yong Qin
YOU?
Author Swipe
View article: Mechanism and field validation of fly-ash grouting for mitigating mining-induced surface subsidence
Mechanism and field validation of fly-ash grouting for mitigating mining-induced surface subsidence Open
The extensive extraction of global coal resources has induced severe surface subsidence in goaf areas, posing signif-icant threats to the surrounding environment and infrastructure. This study investigates the application and mechanisms of…
View article: Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning
Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning Open
Supervised fine-tuning (SFT) of large language models can be viewed as an off-policy learning problem, where expert demonstrations come from a fixed behavior policy while training aims to optimize a target policy. Importance sampling is th…
View article: TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models Open
Text-to-Audio (TTA) generation has made rapid progress, but current evaluation methods remain narrow, focusing mainly on perceptual quality while overlooking robustness, generalization, and ethical concerns. We present TTA-Bench, a compreh…
View article: A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition Open
Dysarthric speech recognition (DSR) enhances the accessibility of smart devices for dysarthric speakers with limited mobility. Previously, DSR research was constrained by the fact that existing datasets typically consisted of isolated word…
View article: RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval Open
The Contrastive Language-Audio Pretraining (CLAP) model has demonstrated excellent performance in general audio description-related tasks, such as audio retrieval. However, in the emerging field of emotional speaking style description (ESS…
View article: UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture
UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture Open
As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture designed to enhance scalability, perfo…
View article: CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition Open
Code-switching (CS), the alternation between two or more languages within a single conversation, presents significant challenges for automatic speech recognition (ASR) systems. Existing Mandarin-English code-switching datasets often suffer…
View article: FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching Open
To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching. By leveraging the autoregressive nature of language …
View article: SDPO: Segment-Level Direct Preference Optimization for Social Agents
SDPO: Segment-Level Direct Preference Optimization for Social Agents Open
Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human pr…
View article: Analysis of High‐Speed Train Operation Accidents Based on the Improved SHIPP
Analysis of High‐Speed Train Operation Accidents Based on the Improved SHIPP Open
Safety anomalies are the early warning and precursors to major accidents. Preventing such incidents requires robust accident models to identify and mitigate risk factors. This study enhances the system hazard identification, prediction, an…
View article: An In-Depth Tutorial on BJTU-RAO Bogie Datasets for Fault Diagnosis
An In-Depth Tutorial on BJTU-RAO Bogie Datasets for Fault Diagnosis Open
The reliability and safety of trains have always been the top priority in the railway industry. As the critical subsystem of trains, the health states of bogie transmission systems directly affect the operation safety of trains. Train faul…
View article: Impact of cascade reservoir on the sources of organic matter in sediments of Lancang river
Impact of cascade reservoir on the sources of organic matter in sediments of Lancang river Open
The construction of dams to intercept natural rivers constitutes the most severe human activity influencing the underlying surface. This study focuses on four cascade reservoirs of the Lancang River and explores their impact on the migrati…
View article: Research on Perforation-Adding Tapping Strategies for Perforation-avoided Production Wells of Block A in Daqing Oilfield
Research on Perforation-Adding Tapping Strategies for Perforation-avoided Production Wells of Block A in Daqing Oilfield Open
To avoid inefficient and ineffective circulation and improve the producing degree of low water-cut layers, ultra-high water-cut layers of new wells were not perforated before putting into production. In order to fully tap remaining oil in …
View article: ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5 Open
Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children's speech re…
View article: AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework Open
Diffusion-based text-to-audio (TTA) generation has made substantial progress, leveraging latent diffusion model (LDM) to produce high-quality, diverse and instruction-relevant audios. However, beyond generation, the task of audio editing r…
View article: M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Open
State-of-the-art models like OpenAI's Whisper exhibit strong performance in multilingual automatic speech recognition (ASR), but they still face challenges in accurately recognizing diverse subdialects. In this paper, we propose M2R-whispe…
View article: Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge Open
The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three track…
View article: Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation Open
Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation me…
View article: Uncertainty-Aware Mean Opinion Score Prediction
Uncertainty-Aware Mean Opinion Score Prediction Open
Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these syst…
View article: Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation Open
Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation me…
View article: LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation Open
Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of …
View article: AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection Open
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied t…
View article: Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores Open
The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performanc…
View article: A Spatiotemporal Probabilistic Graphical Model Based on Adaptive Expectation-Maximization Attention for Individual Trajectory Reconstruction Considering Incomplete Observations
A Spatiotemporal Probabilistic Graphical Model Based on Adaptive Expectation-Maximization Attention for Individual Trajectory Reconstruction Considering Incomplete Observations Open
Spatiotemporal information on individual trajectories in urban rail transit is important for operational strategy adjustment, personalized recommendation, and emergency command decision-making. However, due to the lack of journey observati…
View article: A High-Precision Fall Detection Model Based on Dynamic Convolution in Complex Scenes
A High-Precision Fall Detection Model Based on Dynamic Convolution in Complex Scenes Open
Falls can cause significant harm, and even death, to elderly individuals. Therefore, it is crucial to have a highly accurate fall detection model that can promptly detect and respond to changes in posture. The YOLOv8 model may not effectiv…