Explanipedia

Heptapod: Language Modeling on Visual Signals Open

Yongxin Zhu, Jiawei Chen, Yuanzhe Chen, Zhuo Chen, Dongya Jia , et al. · 2025

We introduce Heptapod, an image autoregressive model that adheres to the foundational principles of language modeling. Heptapod employs \textbf{causal attention}, \textbf{eliminates reliance on CFG}, and \textbf{eschews the trend of semant…

Optimal Control Strategy for Coordinated Charging of Electric Bus Fleet Based on Traffic-Electrical Coupling Mathematical Model Using Mixed Integer Programming Algorithm Open

Yuanzhe Chen · 2025

Optimal Control Strategy for Coordinated Charging of Electric Bus Fleets Based on a Traffic-Electrical Coupling Mathematical Model Using Mixed Integer Programming Algorithm Open

Yuanzhe Chen · 2025

Agent-in-the-loop to distill expert knowledge into artificial intelligence models: a survey Open

Jiayuan Gao, Yingwei Zhang, Yiqiang Chen, Yihan Dong, Yuanzhe Chen , et al. · 2025

Towards Reliable Large Audio Language Model Open

Ziyang Ma, Xiquan Li, Song Yao, Wenxi Chen, Chenpeng Du , et al. · 2025

Recent advancements in large audio language models (LALMs) have demonstrated impressive results and promising prospects in universal understanding and reasoning across speech, music, and general sound. However, these models still lack the …

Agent-in-the-Loop to Distill Expert Knowledge into Artificial Intelligence Models: A Survey Open

Jiayuan Gao, Yingwei Zhang, Yiqiang Chen, Yihan Dong, Yuanzhe Chen , et al. · 2025

Large-scale neural networks have revolutionized many general knowledge areas (e.g., computer vision and language processing), but are still rarely applied in many expert knowledge areas (e.g., healthcare), due to data sparsity and high ann…

Towards Reliable Large Audio Language Model Open

Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du , et al. · 2025

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion Open

Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang · 2024

StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognitio…

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions Open

Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu , et al. · 2024

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from…

NR5A2 promotes epithelial-to-mesenchymal transition in renal fibrosis by targeting MMP25 transcription Open

Xiao Wang, Guang Chen, Weimin Shan, Yuanzhe Chen, Wei Wang , et al. · 2024

Epithelial-to-mesenchymal transition (EMT) is crucial for the progression of renal tubulointerstitial fibrosis, typically leading to end-stage renal failure. The role of Nuclear receptor subfamily 5 group A member 2 (NR5A2) in renal fibros…

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Open

Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen , et al. · 2024

We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and e…

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining Open

Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu , et al. · 2024

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture tempora…

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion Open

Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Zhuo Chen, Lei Xie , et al. · 2024

Recent language model (LM) advancements have showcased impressive zero-shot voice conversion (VC) performance. However, existing LM-based VC models usually apply offline conversion from source semantics to acoustic features, demanding the …

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models Open

Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yu‐Ping Wang · 2023

Language model (LM) based audio generation frameworks, e.g., AudioLM, have recently achieved new state-of-the-art performance in zero-shot audio generation. In this paper, we explore the feasibility of LMs for zero-shot voice conversion. A…

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion Open

Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen , et al. · 2023

Zero-shot voice conversion (VC) converts source speech into the voice of any desired speaker using only one utterance of the speaker without requiring additional model updates. Typical methods use a speaker representation from a pre-traine…

LaTeX2Solver: a Hierarchical Semantic Parsing of LaTeX Document into Code for an Assistive Optimization Modeling Application Open

Rindra Ramamonjison, Timothy T. Yu, Linzi Xing, Mahdi Mostajabdaveh, Xiaorui Li , et al. · 2023

Rindra Ramamonjison, Timothy Yu, Linzi Xing, Mahdi Mostajabdaveh, Xiaorui Li, Xiaojin Fu, Xiongwei Han, Yuanzhe Chen, Ren Li, Kun Mao, Yong Zhang. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol…

Design and Experiment of Online Cottonseed Quality Sorting Device Open

Qiaohua Wang, Yu chengdong, Hongzhou Zhang, Yuanzhe Chen, Chengkang Liu · 2023

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network Open

Dongya Jia, Qiao Tian, Jiaxin Li, Yuanzhe Chen, Kainan Peng , et al. · 2022

The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and d…

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints Open

Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian , et al. · 2022

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC). However, in a low-resource situation, where only limited utterances from the target…

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance Open

Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong , et al. · 2022

Streaming voice conversion (VC) is the task of converting the voice of one person to another in real-time. Previous streaming VC methods use phonetic posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems to repres…

Cloning one's voice using very limited data in the wild Open

Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu , et al. · 2021

With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone …

DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting Open

Dong Sun, Zezheng Feng, Yuanzhe Chen, Yong Wang, Jia Zeng , et al. · 2020

Selecting an appropriate model to forecast product demand is critical to the\nmanufacturing industry. However, due to the data complexity, market uncertainty\nand users' demanding requirements for the model, it is challenging for demand\na…

Salt‐tolerant <i>Staphylococcus</i> bacteria induce structural and nutritional alterations of salted duck egg white Open

Gongnian Xiao, Yuanzhe Chen, Ruosi Fang, Chaogeng Xiao, Haina Yuan , et al. · 2019

Salted duck egg white, a major by‐product of salted egg yolk production, is rich in nutrients. However, its high salinity limits its application in the food industry. In the present study, three haloduric bacterium strains (C1, C2, and C3)…

Vulnerability Parser: A Static Vulnerability Analysis System for Android Applications Open

Yingxian Chang, Bin Liu, Lianri Cong, Hua Deng, Jiaming Li , et al. · 2019

In the case of user information leakage, the security problem of Android applications is of great importance. How to quickly and efficiently detect Android application security vulnerabilities has become an urgent research topic in securit…

PlanningVis: A Visual Analytics Approach to Production Planning in Smart Factories Open

Dong Sun, Renfei Huang, Yuanzhe Chen, Yong Wang, Jia Zeng , et al. · 2019

Production planning in the manufacturing industry is crucial for fully utilizing factory resources (e.g., machines, raw materials and workers) and reducing costs. With the advent of industry 4.0, plenty of data recording the status of fact…

NON-INVASIVE BLOOD GLUCOSE MONITORING OF 95% CERTAINTY BY PRESSURE REGULATED MID-IR Open

Yuanzhe Chen · 2019

To fight against diabetes mellitus, a chronicle metabolic disease, from which more than 400 million people suffer in the world, the patients have to puncture their fingers 4-5 times a day when using a glucometer for the blood glucose level…

Understanding Hidden Memories of Recurrent Neural Networks Open

Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen , et al. · 2017

Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their eff…

A new network-based algorithm for human activity recognition in video Open

Weiyao Lin, Yuanzhe Chen, Jianxin Wu, Hanli Wang, Bin Sheng , et al. · 2015

In this paper, a new network-transmission-based (NTB) algorithm is proposed for human activity recognition in videos. The proposed NTB algorithm models the entire scene as an error-free network. In this network, each node corresponds to a …

Yuanzhe Chen YOU? Author Swipe