Sofoklis Kakouros
YOU?
Author Swipe
View article: Towards an Automated Multimodal Approach for Video Summarization: Building a Bridge Between Text, Audio and Facial Cue-Based Summarization
Towards an Automated Multimodal Approach for Video Summarization: Building a Bridge Between Text, Audio and Facial Cue-Based Summarization Open
The increasing volume of video content in educational, professional, and social domains necessitates effective summarization techniques that go beyond traditional unimodal approaches. This paper proposes a behaviour-aware multimodal video …
View article: Investigating the Impact of Word Informativeness on Speech Emotion Recognition
Investigating the Impact of Word Informativeness on Speech Emotion Recognition Open
In emotion recognition from speech, a key challenge lies in identifying speech signal segments that carry the most relevant acoustic variations for discerning specific emotions. Traditional approaches compute functionals for features such …
View article: Sounding Like a Winner? Prosodic Differences in Post-Match Interviews
Sounding Like a Winner? Prosodic Differences in Post-Match Interviews Open
This study examines the prosodic characteristics associated with winning and losing in post-match tennis interviews. Additionally, this research explores the potential to classify match outcomes solely based on post-match interview recordi…
View article: Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody Open
This paper investigates the use of word surprisal, a measure of the predictability of a word in a given context, as a feature to aid speech synthesis prosody. We explore how word surprisal extracted from large language models (LLMs) correl…
View article: The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech
The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech Open
Parliamentary recordings provide a rich source of data for studying how politicians use speech to convey their messages and influence their audience. This provides a unique context for studying how politicians use speech, especially prosod…
View article: North Sámi Dialect Identification with Self-supervised Speech Models
North Sámi Dialect Identification with Self-supervised Speech Models Open
The North Sámi (NS) language encapsulates four primary dialectal variants that are related but that also have differences in their phonology, morphology, and vocabulary. The unique geopolitical location of NS speakers means that in many ca…
View article: What does BERT learn about prosody?
What does BERT learn about prosody? Open
Language models have become nearly ubiquitous in natural language processing applications achieving state-of-the-art results in many tasks including prosody. As the model design does not define predetermined linguistic targets during train…
View article: Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing
Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing Open
When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels. Self-supervis…
View article: Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations Open
Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks. Aggregating these speech representations across time is typically approached b…
View article: The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language
The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language Open
Digital and mobile devices enable easy access to applications for the learning of foreign languages. However, experimental studies on the effectiveness of these applications are scarce. Moreover, it is not understood whether the effects of…
View article: Comparative Analysis of Majority Language Influence on North Sámi Prosody Using WaveNet-Based modeling
Comparative Analysis of Majority Language Influence on North Sámi Prosody Using WaveNet-Based modeling Open
The Finnmark North Sámi is a variety of North Sámi language, an indigenous, endangered minority language spoken in the northernmost parts of Norway and Finland. The speakers of this language are bilingual, and regularly speak the majority …
View article: Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features
Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features Open
Prominence perception has been known to correlate with a complex interplay of the acoustic features of energy, fundamental frequency, spectral tilt, and duration. The contribution and importance of each of these features in distinguishing …
View article: Predicting Prosodic Prominence from Text with Pre-trained Contextualized\n Word Representations
Predicting Prosodic Prominence from Text with Pre-trained Contextualized\n Word Representations Open
In this paper we introduce a new natural language processing dataset and\nbenchmark for predicting prosodic prominence from written text. To our\nknowledge this will be the largest publicly available dataset with prosodic\nlabels. We descr…
View article: Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations
Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations Open
In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text. To our knowledge this will be the largest publicly available dataset with prosodic labels. We describe…
View article: Cross-linguistic Influences on Sentence Accent Detection in Background Noise
Cross-linguistic Influences on Sentence Accent Detection in Background Noise Open
This paper investigates whether sentence accent detection in a non-native language is dependent on (relative) similarity between prosodic cues to accent between the non-native and the native language, and whether cross-linguistic differenc…
View article: The Effect of Noise on Emotion Perception in an Unknown Language
The Effect of Noise on Emotion Perception in an Unknown Language Open
This is the first study investigating the influence of “realistic” noise on verbal emotion perception in an unknown language. We do so by linking emotion perception to acoustic characteristics known to be correlated with emotion perception…
View article: Sentence Accent Perception in Noise by French Non-Native Listeners of English
Sentence Accent Perception in Noise by French Non-Native Listeners of English Open
International audience
View article: Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions
Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions Open
Spectral tilt has been suggested to be a correlate of prominence in speech, although several studies have not replicated this empirically. This may be partially due to the lack of a standard method for tilt estimation from speech, renderin…
View article: Cognitive and probabilistic basis of prominence perception in speech
Cognitive and probabilistic basis of prominence perception in speech Open
The research in this thesis examines the topic of the cognitive and probabilistic nature of prominence perception in speech. In recent years, there has been an accumulating number of studies from linguistics, phonetics, and neuroscience pr…
View article: Perception of Sentence Stress in Speech Correlates With the Temporal Unpredictability of Prosodic Features
Perception of Sentence Stress in Speech Correlates With the Temporal Unpredictability of Prosodic Features Open
Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, …