Explanipedia

Cross-cultural dimensions organizing prosodic attitudes reception Open

Albert Rilliard, João Antônio de Moraes, Donna Erickson, Marine Guerry, Angelika Hönemann , et al. · 2025

We present a meta-analysis of results from experimental studies on attitude reception in seven languages (Brazilian Portuguese, Japanese, French, German, Cantonese, American English, Hindi). The studies involved free-labeling of perceived …

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation Open

Yujia Xiao, Liumeng Xue, Lei He, Xinyi Chen, Aemon Yat Fei Chiu , et al. · 2025

Recently, an increasing number of multimodal (text and audio) benchmarks have emerged, primarily focusing on evaluating models' understanding capability. However, exploration into assessing generative capabilities remains limited, especial…

An automated extraction of spectral-temporal and spatial-temporal features of EEG for emotion detection Open

Monira Islam, Tan Lee · 2025

Emotion is an integral part of human cognitive processes and behaviors. Automatic detection and classification of human emotion has been a goal of applied research. This study presents an approach to detecting emotion from multivariate ele…

Deviant functional connectivity patterns in the EEG related to developmental dyslexia and their potential use for screening Open

Yaqi Yang, Zhaoyu Liu, Brian W. L. Wong, Shuting Huo, Jie Wang , et al. · 2025

Psychology Philosophy

Developmental dyslexia (DD) is a common learning disorder with potential neural origins. While EEG-based brain activation measures combined with machine learning have shown promise for DD screening, these approaches often lack validation o…

PodAgent: A Comprehensive Framework for Podcast Generation Open

Yu Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee · 2025

Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodA…

A Large-Scale Probing Analysis of Speaker-Specific Attributes in Self-Supervised Speech Representations Open

Aemon Yat Fei Chiu, Pascale Fung, Rongsheng Li, Jingyu Li, Tan Lee · 2025

Computer science Philosophy

Speech self-supervised learning (SSL) models are known to learn hierarchical representations, yet how they encode different speaker-specific attributes remains under-explored. This study investigates the layer-wise disentanglement of speak…

An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems Open

Jingyu Li, Aemon Yat Fei Chiu, Tan Lee · 2024

Computer science Psychology Chemistry

Language mismatch is among the most common and challenging domain mismatches in deploying speaker verification (SV) systems. Adversarial reprogramming has shown promising results in cross-language adaptation for SV. The reprogramming is im…

CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research Open

Dehua Tao, Harold Chui, Sarah Luk, Tan Lee · 2024

Psychology Medicine

Psychotherapy or counseling is typically conducted through spoken conversation between a therapist and a client. Analyzing the speech characteristics of psychotherapeutic interactions can help understand the factors associated with effecti…

Personalized Voice Synthesis through Human-in-the-Loop Coordinate Descent Open

Yusheng Tian, Junbin Liu, Tan Lee · 2024

Computer science

This paper describes a human-in-the-loop approach to personalized voice synthesis in the absence of reference speech data from the target speaker. It is intended to help vocally disabled individuals restore their lost voices without requir…

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis Open

Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee · 2024

Computer science Mathematics

Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our prelim…

A Parameter-efficient Language Extension Framework for Multilingual ASR Open

Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee · 2024

Computer science Philosophy

Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilist…

Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss Open

Yusheng Tian, Jingyu Li, Tan Lee · 2024

Computer science Philosophy Art

This research is about the creation of personalized synthetic voices for head and neck cancer survivors. It is focused particularly on tongue cancer patients whose speech might exhibit severe articulation impairment. Our goal is to restore…

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR Open

Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee · 2024

Computer science Physics Political science

Toward high-performance multilingual automatic speech recognition (ASR), various types of linguistic information and model design have demonstrated their effectiveness independently. They include language identity (LID), phoneme informatio…

Modeling Intrapersonal and Interpersonal Influences for Automatic Estimation of Therapist Empathy in Counseling Conversation Open

Dehua Tao, Tan Lee, Harold Chui, Sarah Luk · 2023

Psychology

Counseling is usually conducted through spoken conversation between a therapist and a client. The empathy level of therapist is a key indicator of outcomes. Presuming that therapist's empathy expression is shaped by their past behavior and…

A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation Open

Dehua Tao, Tan Lee, Harold Chui, Sarah Luk · 2023

Psychology Computer science Philosophy

Counseling is carried out as spoken conversation between a therapist and a client. The empathy level expressed by the therapist is considered an important index of the quality of counseling and often assessed by an observer or the client. …

Efficient Black-Box Speaker Verification Model Adaptation with Reprogramming and Backend Learning Open

Jingyu Li, Tan Lee · 2023

Computer science Physics Mathematics

The development of deep neural networks (DNN) has significantly enhanced the performance of speaker verification (SV) systems in recent years. However, a critical issue that persists when applying DNN-based SV systems in practical applicat…

CoMFLP: Correlation Measure based Fast Search on ASR Layer Pruning Open

Wei Liu, Zhiyuan Peng, Tan Lee · 2023

Computer science Mathematics Economics

Transformer-based speech recognition (ASR) model with deep layers exhibited significant performance improvement. However, the model is inefficient for deployment on resource-constrained devices. Layer pruning (LP) is a commonly used compre…

Sparsely Shared LoRA on Whisper for Child Speech Recognition Open

Wei Liu, Ying Qin, Zhiyuan Peng, Tan Lee · 2023

Computer science Mathematics Physics

Whisper is a powerful automatic speech recognition (ASR) model. Nevertheless, its zero-shot performance on low-resource speech requires further improvement. Child speech, as a representative type of low-resource speech, is leveraged for ad…

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading Open

Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He , et al. · 2023

Computer science Biology Philosophy

While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) igno…

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models Open

Yusheng Tian, Guangyan Zhang, Tan Lee · 2023

Computer science Political science Philosophy

This paper is about developing personalized speech synthesis systems with recordings of mildly impaired speech. In particular, we consider consonant and vowel alterations resulted from partial glossectomy, the surgical removal of part of t…

Learning Representation of Therapist Empathy in Counseling Conversation Using Siamese Hierarchical Attention Network Open

Dehua Tao, Tan Lee, Harold Chui, Sarah Luk · 2023

Psychology Computer science Political science

Counseling is an activity of conversational speaking between a therapist and a client. Therapist empathy is an essential indicator of counseling quality and assessed subjectively by considering the entire conversation. This paper proposes …

Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data Open

Yusheng Tian, Wei Liu, Tan Lee · 2023

Computer science Chemistry

Creating synthetic voices with found data is challenging, as real-world recordings often contain various types of audio degradation. One way to address this problem is to pre-enhance the speech with an enhancement model and then use the en…

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring Open

Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li , et al. · 2023

Computer science Philosophy

Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunci…

An ASR-free Fluency Scoring Approach with Self-Supervised Learning Open

Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li , et al. · 2023

Computer science Psychology

A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency w…

Formality in psychotherapy: How are therapists’ and clients’ use of discourse particles related to therapist empathy? Open

Jonathan Him Nok Lee, Harold Chui, Tan Lee, Sarah Luk, Dehua Tao , et al. · 2022

Psychology Philosophy Computer science

Introduction Previous studies explored the preferences for therapists’ attire and office setting based on initial impressions as a reference for the formality in psychotherapy. This study examines the formality of psychotherapy by investig…

Covariance Regularization for Probabilistic Linear Discriminant Analysis Open

Zhiyuan Peng, Mingjie Shao, Xuanji He, Xu Li, Tan Lee , et al. · 2022

Computer science Mathematics

Probabilistic linear discriminant analysis (PLDA) is commonly used in speaker verification systems to score the similarity of speaker embeddings. Recent studies improved the performance of PLDA in domain-matched conditions by diagonalizing…

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition Open

Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan · 2022

Computer science Engineering Chemistry

Very deep models for speaker recognition (SR) have demonstrated remarkable performance improvement in recent research. However, it is impractical to deploy these models for on-device applications with constrained computational resources. O…

Model Compression for DNN-based Speaker Verification Using Weight Quantization Open

Jingyu Li, Zhaoyang Zhang, Jiong Wang, Tan Lee, Lee, Tan · 2022

Computer science Mathematics Chemistry

DNN-based speaker verification (SV) models demonstrate significant performance at relatively high computation costs. Model compression can be applied to reduce the model size for lower resource consumption. The present study exploits weigh…

Tan Lee YOU? Author Swipe