Explanipedia

Can They Dixit? Yes they Can! Dixit as a Playground for Multimodal Language Model Capabilities Open

Balepur, Nishant, Nguyen, Dang, Ki, Dayeon · 2025

Multi-modal large language models (MLMs) are often assessed on static, individual benchmarks -- which cannot jointly assess MLM capabilities in a single task -- or rely on human or model pairwise comparisons -- which is highly subjective, …

Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG Open

Ki, Dayeon, Carpuat, Marine, McNamee, Paul, Khashabi, Daniel, Yang, Eugene , et al. · 2025

Multilingual Retrieval-Augmented Generation (mRAG) systems enable language models to answer knowledge-intensive queries with citation-supported responses across languages. While such systems have been proposed, an open questions is whether…

GraphicBench: A Planning Benchmark for Graphic Design with Language Agents Open

Ki, Dayeon, Zhou, Tianyi, Carpuat, Marine, Wu Gang, Mathur, Puneet , et al. · 2025

Large Language Model (LLM)-powered agents have unlocked new possibilities for automating human tasks. While prior work has focused on well-defined tasks with specified goals, the capabilities of agents in creative design tasks with open-en…

AskQE: Question Answering as Automatic Evaluation for Machine Translation Open

Ki, Dayeon, Duh Kevin, Carpuat, Marine · 2025

How can a monolingual English speaker determine whether an automatic translation in French is good enough to be shared? Existing MT error detection and quality estimation (QE) techniques do not address this practical scenario. We introduce…

Ki, Dayeon YOU? Author Swipe