Raphael Tang
YOU?
Author Swipe
View article: Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation
Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation Open
In arena-style evaluation of large language models (LLMs), two LLMs respond to a user query, and the user chooses the winning response or deems the "battle" a draw, resulting in an adjustment to the ratings of both models. The prevailing a…
View article: Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning
Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning Open
We introduce Geo-R1, a reasoning-centric post-training framework that unlocks geospatial reasoning in vision-language models by combining thinking scaffolding and elevating. In the scaffolding stage, Geo-R1 instills a ``geospatial thinking…
View article: Lost in Embeddings: Information Loss in Vision-Language Models
Lost in Embeddings: Information Loss in Vision-Language Models Open
Vision--language models (VLMs) often process visual inputs through a pretrained vision encoder, followed by a projection into the language model's embedding space via a connector component. While crucial for modality fusion, the potential …
View article: WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers
WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers Open
Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen vocabulary and parlance. In many real-world settings, collecting speech data is impractical, necessitating …
View article: Geospatial Foundational Embedder: Top-1 Winning Solution on EarthVision Embed2Scale Challenge (CVPR 2025)
Geospatial Foundational Embedder: Top-1 Winning Solution on EarthVision Embed2Scale Challenge (CVPR 2025) Open
EarthVision Embed2Scale challenge (CVPR 2025) aims to develop foundational geospatial models to embed SSL4EO-S12 hyperspectral geospatial data cubes into embedding vectors that faciliatetes various downstream tasks, e.g., classification, r…
View article: Multilingual Language Model Pretraining using Machine-translated Data
Multilingual Language Model Pretraining using Machine-translated Data Open
High-resource languages such as English, enables the pretraining of high-quality large language models (LLMs). The same can not be said for most other languages as LLMs still underperform for non-English languages, likely due to a gap in t…
View article: Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language
Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language Open
English, as a very high-resource language, enables the pretraining of high-quality large language models (LLMs). The same cannot be said for most other languages, as leading LLMs still underperform for non-English languages, likely due to …
View article: "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time
"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time Open
Customer service is how companies interface with their customers. It can\ncontribute heavily towards the overall customer satisfaction. However,\nhigh-quality service can become expensive, creating an incentive to make it as\ncost efficien…
View article: FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Open
Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curat…
View article: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation Open
Diffusion models are the state of the art in text-to-image generation, but their perceptual variability remains understudied. In this paper, we examine how prompts affect image variability in black-box diffusion-based models. We propose W1…
View article: Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Open
Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success …
View article: Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models Open
Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover,…
View article: What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations Open
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond? To bypass their refusal to "speak," we study this research question by probing contextualized embeddings and exploring whether this bias is…
View article: Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models Open
Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking. To address this, we propose permutation self-consistency, a form of self-consistency over ranking list outputs of …
View article: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors Open
Deep neural networks (DNNs) are often used for text classification due to their high accuracy. However, DNNs can be computationally intensive, requiring millions of parameters and large amounts of labeled data, which can make them expensiv…
View article: Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers
Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers Open
There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc. From a different perspective, we can consider an efficiency method as an op…
View article: What the DAAM: Interpreting Stable Diffusion Using Cross Attention
What the DAAM: Interpreting Stable Diffusion Using Cross Attention Open
Raphael Tang, Linqing Liu, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Pontus Stenetorp, Jimmy Lin, Ferhan Ture. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 20…
View article: Less is More: Parameter-Free Text Classification with Gzip
Less is More: Parameter-Free Text Classification with Gzip Open
Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which c…
View article: SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale Open
End-to-end automatic speech recognition systems represent the state of the art, but they rely on thousands of hours of manually annotated speech for training, as well as heavyweight computation for inference. Of course, this impedes commer…
View article: What the DAAM: Interpreting Stable Diffusion Using Cross Attention
What the DAAM: Interpreting Stable Diffusion Using Cross Attention Open
Large-scale diffusion neural networks represent a substantial milestone in text-to-image generation, but they remain poorly understood, lacking interpretability analyses. In this paper, we perform a text-image attribution analysis on Stabl…
View article: Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers Open
There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc. We can consider an efficiency method as an operator applied on a model. Nat…
View article: SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale Open
Raphael Tang, Karun Kumar, Gefei Yang, Akshat Pandey, Yajie Mao, Vladislav Belyaev, Madhuri Emmadi, Craig Murray, Ferhan Ture, Jimmy Lin. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Trac…
View article: Voice Query Auto Completion
Voice Query Auto Completion Open
Query auto completion (QAC) is the task of predicting a search engine user’s final query from their intermediate, incomplete query. In this paper, we extend QAC to the streaming voice search setting, where automatic speech recognition syst…
View article: BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression
BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression Open
The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency. This paper aims to address two weaknesses of previous w…
View article: The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing
The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing Open
Ji Xin, Raphael Tang, Yaoliang Yu, Jimmy Lin. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.