Luisa Bentivogli
YOU?
Author Swipe
View article: The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models Open
Contrastive explanations, which indicate why an AI system produced one output (the target) instead of another (the foil), are widely regarded in explainable AI as more informative and interpretable than standard explanations. However, obta…
View article: Cross-Attention is Half Explanation in Speech-to-Text Models
Cross-Attention is Half Explanation in Speech-to-Text Models Open
Cross-attention is a core mechanism in encoder-decoder architectures, widespread in many fields, including speech-to-text (S2T) processing. Its scores have been repurposed for various downstream applications--such as timestamp estimation a…
View article: Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation
Better Late Than Never: Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation Open
Simultaneous speech-to-text translation (SimulST) systems have to balance translation quality with latency--the delay between speech input and the translated output. While quality evaluation is well established, accurate latency measuremen…
View article: An Interdisciplinary Approach to Human-Centered Machine Translation
An Interdisciplinary Approach to Human-Centered Machine Translation Open
Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present. Despite progress in MT technology, a gap persists between system development and real-world usage, particularly for non…
View article: Different Speech Translation Models Encode and Translate Speaker Gender Differently
Different Speech Translation Models Encode and Translate Speaker Gender Differently Open
Recent studies on interpreting the hidden states of speech models have shown their ability to capture speaker-specific features, including gender. Does this finding also hold for speech translation (ST) models? If so, what are the implicat…
View article: Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution
Echoes of Phonetics: Unveiling Relevant Acoustic Cues for ASR via Feature Attribution Open
Despite significant advances in ASR, the specific acoustic cues models rely on remain unclear. Prior studies have examined such cues on a limited set of phonemes and outdated models. In this work, we apply a feature attribution technique t…
View article: FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian Open
The development of speech foundation models (SFMs) like Whisper and SeamlessM4T has significantly advanced the field of speech processing. However, their closed nature--with inaccessible training data and code--poses major reproducibility …
View article: A decade of gender bias in machine translation
A decade of gender bias in machine translation Open
Gender bias in machine translation (MT) has been studied for over a decade, a time marked by societal, linguistic, and technological shifts. With the early optimism for a quick solution in mind, we review over 100 studies on the topic and …
View article: Translation in the Hands of Many:Centering Lay Users in Machine Translation Interactions
Translation in the Hands of Many:Centering Lay Users in Machine Translation Interactions Open
Converging societal and technical factors have transformed language technologies into user-facing applications used by the general public across languages. Machine Translation (MT) has become a global tool, with cross-lingual services now …
View article: Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE
Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE Open
Avoiding the propagation of undue (binary) gender inferences and default masculine language remains a key challenge towards inclusive multilingual technologies, particularly when translating into languages with extensive gendered morpholog…
View article: Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison Open
Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is increasing interest in extending their capabilities to speech -- the most common form of communication. The most widespread approach to integrating spe…
View article: Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison Open
View article: Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE
Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE Open
View article: The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence Open
View article: Different Speech Translation Models Encode and Translate Speaker Gender Differently
Different Speech Translation Models Encode and Translate Speaker Gender Differently Open
View article: The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models Open
View article: An Interdisciplinary Approach to Human-Centered Machine Translation
An Interdisciplinary Approach to Human-Centered Machine Translation Open
View article: Findings of the IWSLT 2025 Evaluation Campaign
Findings of the IWSLT 2025 Evaluation Campaign Open
View article: Translation in the Hands of Many: Centering Lay Users in Machine Translation Interactions
Translation in the Hands of Many: Centering Lay Users in Machine Translation Interactions Open
View article: GFG -- Gender-Fair Generation: A CALAMITA Challenge
GFG -- Gender-Fair Generation: A CALAMITA Challenge Open
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-m…
View article: SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Open
Spurred by the demand for interpretable models, research on eXplainable AI for language technologies has experienced significant growth, with feature attribution methods emerging as a cornerstone of this progress. While prior work in NLP e…
View article: MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Open
The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the ope…
View article: What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study Open
Gender bias in machine translation (MT) is recognized as an issue that can harm people and society. And yet, advancements in the field rarely involve people, the final MT users, or inform how they might be impacted by biased technologies. …
View article: How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not Open
The remarkable performance achieved by Large Language Models (LLM) has driven research efforts to leverage them for a wide range of tasks and input modalities. In speech-to-text (S2T) tasks, the emerging solution consists of projecting the…
View article: SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation
SimulSeamless: FBK at IWSLT 2024 Simultaneous Speech Translation Open
This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. For this year's submission in the speech-to-text translation (ST) sub-track, we propose SimulSeamless, which is realized by com…
View article: StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection Open
Streaming speech-to-text translation (StreamST) is the task of automatically translating speech while incrementally receiving an audio stream. Unlike simultaneous ST (SimulST), which deals with pre-segmented speech, StreamST faces the chal…
View article: SBAAM! Eliminating Transcript Dependency in Automatic Subtitling
SBAAM! Eliminating Transcript Dependency in Automatic Subtitling Open
Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that g…
View article: Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models
Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models Open
Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative…
View article: How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena Open
The attention mechanism, a cornerstone of state-of-the-art neural models, faces computational hurdles in processing long sequences due to its quadratic complexity. Consequently, research efforts in the last few years focused on finding mor…
View article: Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? Open
The field of natural language processing (NLP) has recently witnessed a transformative shift with the emergence of foundation models, particularly Large Language Models (LLMs) that have revolutionized text-based NLP. This paradigm has exte…