Explanipedia

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Open

Haonan Chen, Hongyuan Liu, Yuping Luo, Liang Wang, Nan Yang , et al. · 2025

Multimodal embedding models, built upon causal Vision Language Models (VLMs), have shown promise in various tasks. However, current approaches face three key limitations: the use of causal attention in VLM backbones is suboptimal for embed…

Reinforcement Pre-Training Open

Quennie Dong, Dong Li, Yao Tang, Tianzhu Ye, Yutao Sun , et al. · 2025

In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it …

Think Only When You Need with Large Hybrid-Reasoning Models Open

Lingjie Jiang, X. J. Wu, Shaohan Huang, Qingxiu Dong, Zewen Chi , et al. · 2025

Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessive…

Efficient RL Training for Reasoning Models via Length-Aware Optimization Open

Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang , et al. · 2025

Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten r…

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Open

Hongyu Wang, Shuming Ma, Furu Wei · 2025

Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by activation outliers, which complicate quantization to low bit-widths. We introduce BitNet v2, a novel framework enabling native 4-bit activation quantization for 1-b…

Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes Open

Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yifan Zhang , et al. · 2025

Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spati…

BitNet b1.58 2B4T Technical Report Open

Shuming Ma, Hongyu Wang, Shaohan Huang, Xingxing Zhang, Ying Jun Hu , et al. · 2025

We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering l…

Model as a Game: On Numerical and Spatial Consistency for Generative Games Open

Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Dong Li , et al. · 2025

Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties su…

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Open

Won Sik Yang, Shuming Ma, Yankai Lin, Furu Wei · 2025

Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks. While current researches continue to explore the benefit…

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs Open

Jinheng Wang, Hai Zhou, Ting Song, Shijie Cao, Yan Xia , et al. · 2025

Computer science

The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ternary LLMs remain scarce. To bridge t…

mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Open

Haonan Chen, Liang Wang, Nan Yang, Yutao Zhu, Ziliang Zhao , et al. · 2025

Computer science Physics

Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space. However, the limited labeled multimodal data often hinders…

Examining False Positives under Inference Scaling for Mathematical Reasoning Open

Yu Wang, Nan Yang, Liang Wang, Furu Wei, Feng, Fuli · 2025

Computer science Psychology Mathematics

Recent advancements in language models have led to significant improvements in mathematical reasoning across various benchmarks. However, most of these benchmarks rely on automatic evaluation methods that only compare final answers using h…

Chain-of-Retrieval Augmented Generation Open

Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou , et al. · 2025

Computer science Physics

This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before t…

Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Open

Yu Yang, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang , et al. · 2025

Computer science Psychology Philosophy

Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet often rely on single-paradigm reasoning, limiting their effectiveness across diverse tasks. We introduce Chain-of-Reasoning (CoR), a novel unified frame…

GeAR: Generation Augmented Retrieval Open

Haoyu Liu, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun , et al. · 2025

Computer science Political science

Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. However, such scalar similarity is diffi…

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Open

Yong Huang, Tianyi Gao, Haoran Xu, Qihao Zhao, Yang Song , et al. · 2025

Political science Philosophy

Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface. These maps are indispensable in various fields, including disaster detection, resou…

GeAR: Generation Augmented Retrieval Open

Haoyu Liu, Shaohan Huang, Jianfeng Liu, Yuefeng Zhan, Hao Sun , et al. · 2025

Computer science

Document retrieval techniques are essential for developing large-scale information systems. The common approach involves using a bi-encoder to compute the semantic similarity between a query and documents. However, the scalar similarity of…

MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark Open

Qihao Zhao, Yangyu Huang, Tengchao Lv, Lei Cui, Qi Sun , et al. · 2024

Computer science Engineering Biology

Multiple-choice question (MCQ) datasets like Massive Multitask Language Understanding (MMLU) are widely used to evaluate the commonsense, understanding, and problem-solving abilities of large language models (LLMs). However, the open-sourc…

Context-DPO: Aligning Language Models for Context-Faithfulness Open

Baolong Bi, Shaohan Huang, Yiwei Wang, Tianchi Yang, Zihan Zhang , et al. · 2024

Computer science Biology Philosophy

Reliable responses from large language models (LLMs) require adherence to user instructions and retrieved information. While alignment techniques help LLMs align with human intentions and values, improving context-faithfulness through alig…

Multimodal Latent Language Modeling with Next-Token Diffusion Open

Yutao Sun, Hangbo Bao, Wenhui Wang, Zhiliang Peng, Li Dong , et al. · 2024

Computer science Physics Philosophy

Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video). In this work, we propose Latent Language Modeling (LatentLM), which seamlessly inte…

Furu Wei YOU? Author Swipe