Explanipedia

FOXO1-NMNAT3 axis dysregulation promotes doxorubicin cardiotoxicity: NAD <sup>+</sup> replenishment as a redox-targeted antioxidant therapy Open

Fang Cheng, Minzhu Zhao, Qi Wang, Hongli Xiong, Kai Yu , et al. · 2025

This study establishes the dysregulation of the FOXO1-NMNAT3 axis as a key mechanism underlying NAD⁺ depletion in DIC. Targeting this axis through NAD⁺ replenishment, particularly by activating NMNAT3, offers a novel …

Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis Open

Qingyu Liu, Yushen Chen, Zhikang Niu, Chunhui Wang, Yunting Yang , et al. · 2025

Flow-matching-based text-to-speech (TTS) models have shown high-quality speech synthesis. However, most current flow-matching-based TTS models still rely on reference transcripts corresponding to the audio prompt for synthesis. This depend…

Research on Tennis Match Outcome Prediction Based on Multi-Algorithm Integration and Bayesian Analysis Open

Kai Yu, Jingjing Liu, Xutao Meng · 2025

The intense competition in the men's singles final of the 2023 Wimbledon Championships highlighted the dynamic and unpredictable nature of tennis matches. Inspired by this observation, this study aims to quantify and analyze momentum shift…

Robust and Efficient Autoregressive Speech Synthesis with Dynamic Chunk-wise Prediction Policy Open

Bohan Li, Zhihan Li, Haoran Wang, Hanglei Zhang, Yiwei Guo , et al. · 2025

Recently, autoregressive (AR) language models have emerged as a dominant approach in speech synthesis, offering expressive generation and scalable training. However, conventional AR speech synthesis models relying on the next-token predict…

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate Open

Hankun Wang, Yiwei Guo, Chunli Shao, Bohan Li, Xie Chen , et al. · 2025

Neural speech codecs have been widely used in audio compression and various downstream tasks. Current mainstream codecs are fixed-frame-rate (FFR), which allocate the same number of tokens to every equal-duration slice. However, speech is …

Improving estimation of winter wheat biophysical traits using solar-induced fluorescence indices and a multi-task Gaussian process model Open

Ying‐Jin Yuan, Kai Yu, Youmin Hu, A. Belwalkar · 2025

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling Open

Qiankun Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang , et al. · 2025

Flow-matching-based text-to-speech (TTS) models, such as Voicebox, E2 TTS, and F5-TTS, have attracted significant attention in recent years. These models require multiple sampling steps to reconstruct speech from noise, making inference sp…

MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding Open

Yu Xi, Haoyu Li, Xiaoyu Gu, Yidi Jiang, Kai Yu · 2025

Keyword spotting (KWS) is essential for voice-driven applications, demanding both accuracy and efficiency. Traditional ASR-based KWS methods, such as greedy and beam search, explore the entire search space without explicitly prioritizing k…

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining Open

Jianheng Zhuo, Yiwen Shao, Yong Xu, Yu Dong, Kai Yu , et al. · 2025

Automatic speech recognition (ASR) has made remarkable progress but heavily relies on large-scale labeled data, which is scarce for low-resource languages like Vietnamese. While existing systems such as Whisper, USM, and MMS achieve promis…

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate Open

Yiwei Guo, Zhihan Li, Xiang Hao, Xie Chen, Kai Yu · 2025

Most neural speech codecs achieve bitrate adjustment through intra-frame mechanisms, such as codebook dropout, at a Constant Frame Rate (CFR). However, speech segments inherently have time-varying information density (e.g., silent interval…

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Open

Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu , et al. · 2025

Recent zero-shot text-to-speech (TTS) systems face a common dilemma: autoregressive (AR) models suffer from slow generation and lack duration controllability, while non-autoregressive (NAR) models lack temporal modeling and typically requi…

Developing ChemDFM as a large language foundation model for chemistry Open

Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li , et al. · 2025

Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling Open

Yuncong Liu, P.K. Fung, Kai Yu · 2025

Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness, particularly when processing queries exceeding their knowledge boundaries. While existing mitigation strategies employ uncertainty estimation …

Prediction of Soil Heavy Metal Extraction Efficiency by Leaching Agents and Identification of Key Factors Based on Machine Learning Algorithms Open

Liping Qiu, Jingtao Wu, Zhong Ren, Xianhua Qiu, Wei Xiong , et al. · 2025

Alignment for Efficient Tool Calling of Large Language Models Open

Hongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen , et al. · 2025

DFM: Dialogue foundation model for universal large-scale dialogue-oriented task learning Open

Zhi Chen, Da Ma, Hanqi Li, Lu Chen, Jiabao Ji , et al. · 2025

Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model …

UiO series of MOFs and their composites for photocatalytic CO2 reduction: A review Open

Liqing Shi, Tianyuan Xin, Peng Cheng, Zhengying Wu, Shuangxi Liu , et al. · 2025

Photocatalytic reduction of CO2 to produce valuable fuels or chemicals is a promising CO2 utilization technology, which is of great significance for carbon emission reduction. The unique features of the UiO series of metal-organic framewor…

Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation Open

Pengchao Feng, Ziyang Ma, Wenxi Chen, Yao Li, Sheng Wang , et al. · 2025

MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation Open

Zichen Zhu, Hao Tang, Y.F. Li, Dan Liu, Hongshen Xu , et al. · 2025

Research on Evacuation Behavior in Urban Villages Based on Social Networks Open

Kai Yu, Tianyu Wang, Lujie Zhou, Menghan Wang, Zhengwei Li · 2025

Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective Open

Hankun Wang, Haoran Wang, Yiwei Guo, Zhihan Li, Chenpeng Du , et al. · 2024

Although text-based large language models exhibit human-level writing ability and remarkable intelligence, speech language models (SLMs) still struggle to generate semantically coherent outputs. There are several potential reasons for this…

Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency Open

Yu Xi, Haoyu Li, Xiaoyu Gu, Hao Li, Yidi Jiang , et al. · 2024

Connectionist Temporal Classification (CTC), a non-autoregressive training criterion, is widely used in online keyword spotting (KWS). However, existing CTC-based KWS decoding strategies either rely on Automatic Speech Recognition (ASR), w…

NTC-KWS: Noise-aware CTC for Robust Keyword Spotting Open

Yu Xi, Haoyu Li, Hao Li, Jiaqi Guo, Xu Li , et al. · 2024

In recent years, there has been a growing interest in designing small-footprint yet effective Connectionist Temporal Classification based keyword spotting (CTC-KWS) systems. They are typically deployed on low-resource computing platforms, …

Reducing Tool Hallucination via Reliability Alignment Open

Hongshen Xu, Suming Zhu, Zihan Wang, Hang Zheng, Da Ma , et al. · 2024

Large Language Models (LLMs) have expanded their capabilities beyond language generation to interact with external tools, enabling automation and real-world applications. However, tool hallucinations, where models either select inappropria…

Unified Pathological Speech Analysis with Prompt Tuning Open

Yang Fei, Xuenan Xu, Mengyue Wu, Kai Yu · 2024

Pathological speech analysis has been of interest in the detection of certain diseases like depression and Alzheimer's disease and attracts much interest from researchers. However, previous pathological speech analysis models are commonly …

MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation Open

Zichen Zhu, Hao Tang, Y.F. Li, Kunyao Lan, Yixuan Jiang , et al. · 2024

Existing Multimodal Large Language Model (MLLM)-based agents face significant challenges in handling complex GUI (Graphical User Interface) interactions on devices. These challenges arise from the dynamic and structured nature of GUI envir…

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Open

Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang , et al. · 2024

This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as duration model, text encoder, and phoneme alignment, the text…

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders Open

Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang , et al. · 2024

We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task…

Semi-supervised Learning for Code-Switching ASR with Large Language Model Filter Open

Yu Xi, Wen Ding, Kai Yu, Junjie Lai · 2024

Code-switching (CS) phenomenon occurs when words or phrases from different languages are alternated in a single sentence. Due to data scarcity, building an effective CS Automatic Speech Recognition (ASR) system remains challenging. In this…

Text-aware Speech Separation for Multi-talker Keyword Spotting Open

Haoyu Li, Baochen Yang, Yu Xi, Linfeng Yu, Tan Tian Swee , et al. · 2024

For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail pa…

Kai Yu YOU? Author Swipe