André F. T. Martins
YOU?
Author Swipe
View article: Sample-efficient Integration of New Modalities into Large Language Models
Sample-efficient Integration of New Modalities into Large Language Models Open
Multimodal foundation models can process several modalities. However, since the space of possible modalities is large and evolving over time, training a model from scratch to encompass all modalities is unfeasible. Moreover, integrating a …
View article: Fatores de risco da sepse neonatal precoce
Fatores de risco da sepse neonatal precoce Open
Justificativa/ Problema: A sepse neonatal de início precoce (EOS) ocorre nas primeiras 72 horas de vida do recém-nascido, por transmissão vertical de microrganismos, apresentando alta mortalidade neonatal, principalmente nos pacientes prem…
View article: Should We Still Pretrain Encoders with Masked Language Modeling?
Should We Still Pretrain Encoders with Masked Language Modeling? Open
Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence suggests that decoder models pretrained with Caus…
View article: Long-Context Generalization with Sparse Attention
Long-Context Generalization with Sparse Attention Open
Transformer-based architectures traditionally employ softmax to compute attention weights, which produces dense distributions over all tokens in a sequence. While effective in many settings, this density has been shown to be detrimental fo…
View article: EuroLLM-9B: Technical Report
EuroLLM-9B: Technical Report Open
This report presents EuroLLM-9B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of Europea…
View article: Different Speech Translation Models Encode and Translate Speaker Gender Differently
Different Speech Translation Models Encode and Translate Speaker Gender Differently Open
Recent studies on interpreting the hidden states of speech models have shown their ability to capture speaker-specific features, including gender. Does this finding also hold for speech translation (ST) models? If so, what are the implicat…
View article: Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering
Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering Open
Despite the steady progress in machine translation evaluation, existing automatic metrics struggle to capture how well meaning is preserved beyond sentence boundaries. We posit that reliance on a single intrinsic quality score, trained to …
View article: EuroBERT: Scaling Multilingual Encoders for European Languages
EuroBERT: Scaling Multilingual Encoders for European Languages Open
General-purpose multilingual vector representations, used in retrieval, regression and classification, are traditionally obtained from bidirectional encoder models. Despite their wide applicability, encoders have been recently overshadowed…
View article: LegalBench.PT: A Benchmark for Portuguese Law
LegalBench.PT: A Benchmark for Portuguese Law Open
The recent application of LLMs to the legal field has spurred the creation of benchmarks across various jurisdictions and languages. However, no benchmark has yet been specifically designed for the Portuguese legal system. In this work, we…
View article: Sparse Activations as Conformal Predictors
Sparse Activations as Conformal Predictors Open
Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a sp…
View article: Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral Open
Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designin…
View article: AdaSplash: Adaptive Sparse Flash Attention
AdaSplash: Adaptive Sparse Flash Attention Open
The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks. Adaptive sparsity, of which $α$-entmax attention is an example, offers a flexible data-dependent alternative, but existing …
View article: Fenchel-Young Variational Learning
Fenchel-Young Variational Learning Open
From a variational perspective, many statistical learning criteria involve seeking a distribution that balances empirical risk and regularization. In this paper, we broaden this perspective by introducing a new general class of variational…
View article: Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning Open
View article: Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral Open
View article: Universal Dependencies
Universal Dependencies Open
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research f…
View article: Different Speech Translation Models Encode and Translate Speaker Gender Differently
Different Speech Translation Models Encode and Translate Speaker Gender Differently Open
View article: From Tower to Spire: Adding the Speech Modality to a Translation-Specialist LLM
From Tower to Spire: Adding the Speech Modality to a Translation-Specialist LLM Open
View article: Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation Open
View article: Did Translation Models Get More Robust Without Anyone Even Noticing?
Did Translation Models Get More Robust Without Anyone Even Noticing? Open
View article: Discrete Latent Structure in Neural Networks
Discrete Latent Structure in Neural Networks Open
View article: Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian
Construction-Based Reduction of Translationese for Low-Resource Languages: A Pilot Study on Bavarian Open
View article: EuroLLM: Multilingual Language Models for Europe
EuroLLM: Multilingual Language Models for Europe Open
View article: A Context-aware Framework for Translation-mediated Conversations
A Context-aware Framework for Translation-mediated Conversations Open
Automatic translation systems offer a powerful solution to bridge language barriers in scenarios where participants do not share a common language. However, these systems can introduce errors leading to misunderstandings and conversation b…
View article: Conformalizing Machine Translation Evaluation
Conformalizing Machine Translation Evaluation Open
Several uncertainty estimation methods have been recently proposed for machine translation evaluation. While these methods can provide a useful indication of when not to trust model predictions, we show in this paper that the majority of t…
View article: Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval
Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval Open
Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified…
View article: Findings of the WMT 2024 Shared Task on Chat Translation
Findings of the WMT 2024 Shared Task on Chat Translation Open
This paper presents the findings from the third edition of the Chat Translation Shared Task. As with previous editions, the task involved translating bilingual customer support conversations, specifically focusing on the impact of conversa…
View article: Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation
Watching the Watchers: Exposing Gender Disparities in Machine Translation Quality Estimation Open
Quality estimation (QE)-the automatic assessment of translation quality-has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. While QE metrics have been optimized to ali…
View article: Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation
Modeling User Preferences with Automatic Metrics: Creating a High-Quality Preference Dataset for Machine Translation Open
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads …
View article: EuroLLM: Multilingual Language Models for Europe
EuroLLM: Multilingual Language Models for Europe Open
The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of un…