Explanipedia

Mechanism and field validation of fly-ash grouting for mitigating mining-induced surface subsidence Open

Yong Qin, Chen Cao · 2025

The extensive extraction of global coal resources has induced severe surface subsidence in goaf areas, posing signif-icant threats to the surrounding environment and infrastructure. This study investigates the application and mechanisms of…

Mind the Gap: Data Rewriting for Stable Off-Policy Supervised Fine-Tuning Open

Shiwan Zhao, Xuyang Zhao, Jiaming Zhou, Aobo Kong, Qicheng Li , et al. · 2025

Supervised fine-tuning (SFT) of large language models can be viewed as an off-policy learning problem, where expert demonstrations come from a fixed behavior policy while training aims to optimize a target policy. Importance sampling is th…

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models Open

Hui Wang, Junyang Chen, Huimin Liu, Yuhang Jia, Jiaming Zhou , et al. · 2025

Text-to-Audio (TTA) generation has made rapid progress, but current evaluation methods remain narrow, focusing mainly on perceptual quality while overlooking robustness, generalization, and ethical concerns. We present TTA-Bench, a compreh…

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition Open

Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin · 2025

Dysarthric speech recognition (DSR) enhances the accessibility of smart devices for dysarthric speakers with limited mobility. Previously, DSR research was constrained by the fact that existing datasets typically consisted of isolated word…

RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval Open

Huijun Sun, Jingguang Tian, Jiaming Zhou, Wang Hui, Jie He , et al. · 2025

The Contrastive Language-Audio Pretraining (CLAP) model has demonstrated excellent performance in general audio description-related tasks, such as audio retrieval. However, in the emerging field of emotional speaking style description (ESS…

UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture Open

Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Christopher H.K. Cheng , et al. · 2025

Computer science Art

As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture designed to enhance scalability, perfo…

CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition Open

Jiaming Zhou, Yujie Guo, Shiwan Zhao, Haoqin Sun, Hui Wang , et al. · 2025

Code-switching (CS), the alternation between two or more languages within a single conversation, presents significant challenges for automatic speech recognition (ASR) systems. Existing Mandarin-English code-switching datasets often suffer…

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching Open

Hui Wang, Shujie Liu, Lingwei Meng, Jinyu Li, Yifan Yang , et al. · 2025

Computer science Mathematics

To advance continuous-valued token modeling and temporal-coherence enforcement, we propose FELLE, an autoregressive model that integrates language modeling with token-wise flow matching. By leveraging the autoregressive nature of language …

SDPO: Segment-Level Direct Preference Optimization for Social Agents Open

Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu , et al. · 2025

Computer science Business Economics

Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human pr…

Analysis of High‐Speed Train Operation Accidents Based on the Improved SHIPP Open

Jingwei Li, Yong Qin, Xiaoqing Cheng, Chen-Liang Xu, Jun Yang · 2025

Computer science Environmental science Engineering

Safety anomalies are the early warning and precursors to major accidents. Preventing such incidents requires robust accident models to identify and mitigate risk factors. This study enhances the system hazard identification, prediction, an…

An In-Depth Tutorial on BJTU-RAO Bogie Datasets for Fault Diagnosis Open

Yong Qin, Y. Wang, Zhaojun Li, Biao Wang, Ao Ding , et al. · 2025

Computer science Engineering Medicine

The reliability and safety of trains have always been the top priority in the railway industry. As the critical subsystem of trains, the health states of bogie transmission systems directly affect the operation safety of trains. Train faul…

Impact of cascade reservoir on the sources of organic matter in sediments of Lancang river Open

Yufei Bao, Meng Sun, Yuchun Wang, Ji Lu, Yajie Wu , et al. · 2024

Environmental science Geology Chemistry

The construction of dams to intercept natural rivers constitutes the most severe human activity influencing the underlying surface. This study focuses on four cascade reservoirs of the Lancang River and explores their impact on the migrati…

Research on Perforation-Adding Tapping Strategies for Perforation-avoided Production Wells of Block A in Daqing Oilfield Open

Yong Qin · 2024

Geology Engineering Mathematics

To avoid inefficient and ineffective circulation and improve the producing degree of low water-cut layers, ultra-high water-cut layers of new wells were not perforated before putting into production. In order to fully tap remaining oil in …

ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5 Open

Jiaming Zhou, Shiyao Wang, Shiwan Zhao, Jie He, Haoqin Sun , et al. · 2024

Psychology Computer science Philosophy

Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children's speech re…

AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework Open

Yuhang Jia, Yang Chen, Jinghua Zhao, Song Zhao, Wenjun Zeng , et al. · 2024

Computer science Geography Physics

Diffusion-based text-to-audio (TTA) generation has made substantial progress, leveraging latent diffusion model (LDM) to produce high-quality, diverse and instruction-relevant audios. However, beyond generation, the task of audio editing r…

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Open

Jiaming Zhou, Shiwan Zhao, Jizhou He, Hui Wang, W. Zeng , et al. · 2024

Computer science Geology Geography

State-of-the-art models like OpenAI's Whisper exhibit strong performance in multilingual automatic speech recognition (ASR), but they still face challenges in accurately recognizing diverse subdialects. In this paper, we propose M2R-whispe…

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge Open

Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang , et al. · 2024

Computer science Psychology Medicine

The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three track…

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation Open

Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin · 2024

Computer science Psychology

Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation me…

Uncertainty-Aware Mean Opinion Score Prediction Open

Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun , et al. · 2024

Computer science Mathematics

Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these syst…

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation Open

Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin · 2024

Computer science Psychology Philosophy

Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation me…

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation Open

Wenhao Guan, Kaidi Wang, Wangjin Zhou, Yang Wang, Feng Deng , et al. · 2024

Computer science Mathematics

Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of …

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection Open

Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li , et al. · 2024

Psychology Computer science Medicine

The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied t…

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores Open

Jiaming Zhou, Shiwan Zhao, Hui Wang, Tianhao Zhang, Haoqin Sun , et al. · 2024

Computer science Materials science Philosophy

The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performanc…

A Spatiotemporal Probabilistic Graphical Model Based on Adaptive Expectation-Maximization Attention for Individual Trajectory Reconstruction Considering Incomplete Observations Open

Xuan Sun, Jianyuan Guo, Yong Qin, Xuanchuan Zheng, Shifeng Xiong , et al. · 2024

Computer science Mathematics Physics

Spatiotemporal information on individual trajectories in urban rail transit is important for operational strategy adjustment, personalized recommendation, and emergency command decision-making. However, due to the lack of journey observati…

A High-Precision Fall Detection Model Based on Dynamic Convolution in Complex Scenes Open

Yong Qin, Wuqing Miao, Qian Chen · 2024

Computer science Mathematics Physics

Falls can cause significant harm, and even death, to elderly individuals. Therefore, it is crucial to have a highly accurate fall detection model that can promptly detect and respond to changes in posture. The YOLOv8 model may not effectiv…

Yong Qin YOU? Author Swipe