Explanipedia

Sample-efficient Integration of New Modalities into Large Language Models Open

Osman Batur İnce, André F. T. Martins, Oisin Mac Aodha, Edoardo Maria Ponti · 2025

Multimodal foundation models can process several modalities. However, since the space of possible modalities is large and evolving over time, training a model from scratch to encompass all modalities is unfeasible. Moreover, integrating a …

Fatores de risco da sepse neonatal precoce Open

Laura Barbosa Salomé, Neilson Martins de Oliveira, João Ferraz, Sofia Gabriella Gregolini Catellani, Gustavo Frankenstein Martin , et al. · 2025

Justificativa/ Problema: A sepse neonatal de início precoce (EOS) ocorre nas primeiras 72 horas de vida do recém-nascido, por transmissão vertical de microrganismos, apresentando alta mortalidade neonatal, principalmente nos pacientes prem…

Should We Still Pretrain Encoders with Masked Language Modeling? Open

Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Manuel Faysse, Duarte M. Alves, Emmanuel Malherbe , et al. · 2025

Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence suggests that decoder models pretrained with Caus…

Long-Context Generalization with Sparse Attention Open

Pavlo Vasylenko, Hugo Pitorro, André F. T. Martins, Marcos Treviso · 2025

Transformer-based architectures traditionally employ softmax to compute attention weights, which produces dense distributions over all tokens in a sequence. While effective in many settings, this density has been shown to be detrimental fo…

EuroLLM-9B: Technical Report Open

Pedro Henrique Martins, João Alves, Patrick Fernandes, Ricardo Rei, M. Amin Farajian , et al. · 2025

This report presents EuroLLM-9B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of Europea…

Different Speech Translation Models Encode and Translate Speaker Gender Differently Open

Dennis Fucci, Marco Gaido, Matteo Negri, Luisa Bentivogli, André F. T. Martins , et al. · 2025

Recent studies on interpreting the hidden states of speech models have shown their ability to capture speaker-specific features, including gender. Does this finding also hold for speech translation (ST) models? If so, what are the implicat…

Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering Open

Patrick Fernandes, Sweta Agrawal, Emmanouil Zaranis, André F. T. Martins, Graham Neubig · 2025

Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to …

EuroBERT: Scaling Multilingual Encoders for European Languages Open

Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte M. Alves, André F. T. Martins, Ayoub Hammal , et al. · 2025

General-purpose multilingual vector representations, used in retrieval, regression and classification, are traditionally obtained from bidirectional encoder models. Despite their wide applicability, encoders have been recently overshadowed…

LegalBench.PT: A Benchmark for Portuguese Law Open

Beatriz Canaverde, Telmo Pires, Laura Ribeiro, André F. T. Martins · 2025

The recent application of LLMs to the legal field has spurred the creation of benchmarks across various jurisdictions and languages. However, no benchmark has yet been specifically designed for the Portuguese legal system. In this work, we…

Sparse Activations as Conformal Predictors Open

Margarida M. Campos, João Calém, Sophia Sklaviadis, Mário A. T. Figueiredo, André F. T. Martins · 2025

Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a sp…

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral Open

António Farinhas, Ricardo Rei, Sweta Agrawal, Ricardo Rei, André F. T. Martins · 2025

Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designin…

AdaSplash: Adaptive Sparse Flash Attention Open

Nuno Gonçalves, Marcos V. Treviso, André F. T. Martins · 2025

The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks. Adaptive sparsity, of which $α$-entmax attention is an example, offers a flexible data-dependent alternative, but existing …

Fenchel-Young Variational Learning Open

Sophia Sklaviadis, Sweta Agrawal, António Farinhas, André F. T. Martins, Mário A. T. Figueiredo · 2025

From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational…

Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning Open

Giuseppe Attanasio, Sonal Sannigrahi, Ben Peters, André F. T. Martins · 2025

Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral Open

António Farinhas, Nuno Guerreiro, Sweta Agrawal, Ricardo Rei, André F. T. Martins · 2025

Universal Dependencies Open

Joakim Nivre, Żeljko Agić, Lars Ahrenberg, María Jesús Aranzabe, Masayuki Asahara , et al. · 2025

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research f…

Different Speech Translation Models Encode and Translate Speaker Gender Differently Open

Dennis Fucci, Marco Gaido, Matteo Negri, Luisa Bentivogli, André F. T. Martins , et al. · 2025

From Tower to Spire: Adding the Speech Modality to a Translation-Specialist LLM Open

Kshitij Ambilduke, Ben Peters, Sonal Sannigrahi, Anil Keshwani, Tsz Kin Lam , et al. · 2025

Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation Open

Emmanouil Zaranis, Giuseppe Attanasio, Sweta Agrawal, André F. T. Martins · 2025

Did Translation Models Get More Robust Without Anyone Even Noticing? Open

Ben Peters, André F. T. Martins · 2025

Discrete Latent Structure in Neural Networks Open

Vlad Niculae, Caio Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins · 2025

Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian Open

Peiqin Lin, Marion Thaler, Daniela Goschala, Amir Hossein Kargaran, Yihong Liu , et al. · 2025

EuroLLM: Multilingual Language Models for Europe Open

Pedro Henrique Martins, Patrick Fernandes, João Alves, Ricardo Rei, Ricardo Rei , et al. · 2025

A Context-aware Framework for Translation-mediated Conversations Open

José P. Pombal, Sweta Agrawal, Patrick Fernandes, Emmanouil Zaranis, André F. T. Martins · 2024

Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation b…

Conformalizing Machine Translation Evaluation Open

Chrysoula Zerva, André F. T. Martins · 2024

Several uncertainty estimation methods have been recently proposed for machine translation evaluation. While these methods can provide a useful indication of when not to trust model predictions, we show in this paper that the majority of t…

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval Open

Saul Santos, Vlad Niculae, Daniel McNamee, André F. T. Martins · 2024

Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified…

Findings of the WMT 2024 Shared Task on Chat Translation Open

M. Amin Farajian, António V. Lopes, André F. T. Martins, Sameen Maruf, Gholamreza Haffari · 2024

This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversa…

Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation Open

Emmanouil Zaranis, Giuseppe Attanasio, Sweta Agrawal, André F. T. Martins · 2024

Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to ali…

Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation Open

Sweta Agrawal, José G. C. de Souza, Ricardo Rei, António Farinhas, Gonçalo S. Faria , et al. · 2024

Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads …

EuroLLM: Multilingual Language Models for Europe Open

Pedro Henrique Martins, Patrick Fernandes, João Alves, Ricardo Rei, Ricardo Rei , et al. · 2024

The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of un…

André F. T. Martins YOU? Author Swipe