Yuanzhe Chen
YOU?
Author Swipe
View article: Heptapod: Language Modeling on Visual Signals
Heptapod: Language Modeling on Visual Signals Open
We introduce Heptapod, an image autoregressive model that adheres to the foundational principles of language modeling. Heptapod employs \textbf{causal attention}, \textbf{eliminates reliance on CFG}, and \textbf{eschews the trend of semant…
View article: Optimal Control Strategy for Coordinated Charging of Electric Bus Fleet Based on Traffic-Electrical Coupling Mathematical Model Using Mixed Integer Programming Algorithm
Optimal Control Strategy for Coordinated Charging of Electric Bus Fleet Based on Traffic-Electrical Coupling Mathematical Model Using Mixed Integer Programming Algorithm Open
View article: Optimal Control Strategy for Coordinated Charging of Electric Bus Fleets Based on a Traffic-Electrical Coupling Mathematical Model Using Mixed Integer Programming Algorithm
Optimal Control Strategy for Coordinated Charging of Electric Bus Fleets Based on a Traffic-Electrical Coupling Mathematical Model Using Mixed Integer Programming Algorithm Open
View article: Agent-in-the-loop to distill expert knowledge into artificial intelligence models: a survey
Agent-in-the-loop to distill expert knowledge into artificial intelligence models: a survey Open
View article: Towards Reliable Large Audio Language Model
Towards Reliable Large Audio Language Model Open
Recent advancements in large audio language models (LALMs) have demonstrated impressive results and promising prospects in universal understanding and reasoning across speech, music, and general sound. However, these models still lack the …
View article: Agent-in-the-Loop to Distill Expert Knowledge into Artificial Intelligence Models: A Survey
Agent-in-the-Loop to Distill Expert Knowledge into Artificial Intelligence Models: A Survey Open
Large-scale neural networks have revolutionized many general knowledge areas (e.g., computer vision and language processing), but are still rarely applied in many expert knowledge areas (e.g., healthcare), due to data sparsity and high ann…
View article: Towards Reliable Large Audio Language Model
Towards Reliable Large Audio Language Model Open
View article: StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion Open
StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognitio…
View article: Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions Open
Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from…
View article: NR5A2 promotes epithelial-to-mesenchymal transition in renal fibrosis by targeting MMP25 transcription
NR5A2 promotes epithelial-to-mesenchymal transition in renal fibrosis by targeting MMP25 transcription Open
Epithelial-to-mesenchymal transition (EMT) is crucial for the progression of renal tubulointerstitial fibrosis, typically leading to end-stage renal failure. The role of Nuclear receptor subfamily 5 group A member 2 (NR5A2) in renal fibros…
View article: Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Open
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and e…
View article: T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining Open
Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture tempora…
View article: StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion Open
Recent language model (LM) advancements have showcased impressive zero-shot voice conversion (VC) performance. However, existing LM-based VC models usually apply offline conversion from source semantics to acoustic features, demanding the …
View article: LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models Open
Language model (LM) based audio generation frameworks, e.g., AudioLM, have recently achieved new state-of-the-art performance in zero-shot audio generation. In this paper, we explore the feasibility of LMs for zero-shot voice conversion. A…
View article: Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion
Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion Open
Zero-shot voice conversion (VC) converts source speech into the voice of any desired speaker using only one utterance of the speaker without requiring additional model updates. Typical methods use a speaker representation from a pre-traine…
View article: LaTeX2Solver: a Hierarchical Semantic Parsing of LaTeX Document into Code for an Assistive Optimization Modeling Application
LaTeX2Solver: a Hierarchical Semantic Parsing of LaTeX Document into Code for an Assistive Optimization Modeling Application Open
Rindra Ramamonjison, Timothy Yu, Linzi Xing, Mahdi Mostajabdaveh, Xiaorui Li, Xiaojin Fu, Xiongwei Han, Yuanzhe Chen, Ren Li, Kun Mao, Yong Zhang. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Vol…
View article: Design and Experiment of Online Cottonseed Quality Sorting Device
Design and Experiment of Online Cottonseed Quality Sorting Device Open
View article: Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network Open
The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and d…
View article: Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints Open
Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC). However, in a low-resource situation, where only limited utterances from the target…
View article: Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance Open
Streaming voice conversion (VC) is the task of converting the voice of one person to another in real-time. Previous streaming VC methods use phonetic posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems to repres…
View article: Cloning one's voice using very limited data in the wild
Cloning one's voice using very limited data in the wild Open
With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone …
View article: DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting
DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting Open
Selecting an appropriate model to forecast product demand is critical to the\nmanufacturing industry. However, due to the data complexity, market uncertainty\nand users' demanding requirements for the model, it is challenging for demand\na…
View article: Salt‐tolerant <i>Staphylococcus</i> bacteria induce structural and nutritional alterations of salted duck egg white
Salt‐tolerant <i>Staphylococcus</i> bacteria induce structural and nutritional alterations of salted duck egg white Open
Salted duck egg white, a major by‐product of salted egg yolk production, is rich in nutrients. However, its high salinity limits its application in the food industry. In the present study, three haloduric bacterium strains (C1, C2, and C3)…
View article: Vulnerability Parser: A Static Vulnerability Analysis System for Android Applications
Vulnerability Parser: A Static Vulnerability Analysis System for Android Applications Open
In the case of user information leakage, the security problem of Android applications is of great importance. How to quickly and efficiently detect Android application security vulnerabilities has become an urgent research topic in securit…
View article: PlanningVis: A Visual Analytics Approach to Production Planning in Smart Factories
PlanningVis: A Visual Analytics Approach to Production Planning in Smart Factories Open
Production planning in the manufacturing industry is crucial for fully utilizing factory resources (e.g., machines, raw materials and workers) and reducing costs. With the advent of industry 4.0, plenty of data recording the status of fact…
View article: NON-INVASIVE BLOOD GLUCOSE MONITORING OF 95% CERTAINTY BY PRESSURE REGULATED MID-IR
NON-INVASIVE BLOOD GLUCOSE MONITORING OF 95% CERTAINTY BY PRESSURE REGULATED MID-IR Open
To fight against diabetes mellitus, a chronicle metabolic disease, from which more than 400 million people suffer in the world, the patients have to puncture their fingers 4-5 times a day when using a glucometer for the blood glucose level…
View article: Understanding Hidden Memories of Recurrent Neural Networks
Understanding Hidden Memories of Recurrent Neural Networks Open
Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their eff…
View article: A new network-based algorithm for human activity recognition in video
A new network-based algorithm for human activity recognition in video Open
In this paper, a new network-transmission-based (NTB) algorithm is proposed for human activity recognition in videos. The proposed NTB algorithm models the entire scene as an error-free network. In this network, each node corresponds to a …