Furu Wei
YOU?
Author Swipe
View article: MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Open
Multimodal embedding models, built upon causal Vision Language Models (VLMs), have shown promise in various tasks. However, current approaches face three key limitations: the use of causal attention in VLM backbones is suboptimal for embed…
View article: Reinforcement Pre-Training
Reinforcement Pre-Training Open
In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it …
View article: Think Only When You Need with Large Hybrid-Reasoning Models
Think Only When You Need with Large Hybrid-Reasoning Models Open
Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessive…
View article: Efficient RL Training for Reasoning Models via Length-Aware Optimization
Efficient RL Training for Reasoning Models via Length-Aware Optimization Open
Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten r…
View article: BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Open
Efficient deployment of 1-bit Large Language Models (LLMs) is hindered by activation outliers, which complicate quantization to low bit-widths. We introduce BitNet v2, a novel framework enabling native 4-bit activation quantization for 1-b…
View article: Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes
Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes Open
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spati…
View article: BitNet b1.58 2B4T Technical Report
BitNet b1.58 2B4T Technical Report Open
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering l…
View article: Model as a Game: On Numerical and Spatial Consistency for Generative Games
Model as a Game: On Numerical and Spatial Consistency for Generative Games Open
Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties su…
View article: Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Open
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks. While current researches continue to explore the benefit…
View article: Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs Open
The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ternary LLMs remain scarce. To bridge t…
View article: mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Open
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space. However, the limited labeled multimodal data often hinders…
View article: Examining False Positives under Inference Scaling for Mathematical Reasoning
Examining False Positives under Inference Scaling for Mathematical Reasoning Open
Recent advancements in language models have led to significant improvements in mathematical reasoning across various benchmarks. However, most of these benchmarks rely on automatic evaluation methods that only compare final answers using h…
View article: Chain-of-Retrieval Augmented Generation
Chain-of-Retrieval Augmented Generation Open
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before t…
View article: Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Open
Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet often rely on single-paradigm reasoning, limiting their effectiveness across diverse tasks. We introduce Chain-of-Reasoning (CoR), a novel unified frame…
View article: GeAR: Generation Augmented Retrieval
GeAR: Generation Augmented Retrieval Open
Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. However, such scalar similarity is diffi…
View article: PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Open
Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface. These maps are indispensable in various fields, including disaster detection, resou…
View article: GeAR: Generation Augmented Retrieval
GeAR: Generation Augmented Retrieval Open
Document retrieval techniques are essential for developing large-scale information systems. The common approach involves using a bi-encoder to compute the semantic similarity between a query and documents. However, the scalar similarity of…
View article: MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark Open
Multiple-choice question (MCQ) datasets like Massive Multitask Language Understanding (MMLU) are widely used to evaluate the commonsense, understanding, and problem-solving abilities of large language models (LLMs). However, the open-sourc…
View article: Context-DPO: Aligning Language Models for Context-Faithfulness
Context-DPO: Aligning Language Models for Context-Faithfulness Open
Reliable responses from large language models (LLMs) require adherence to user instructions and retrieved information. While alignment techniques help LLMs align with human intentions and values, improving context-faithfulness through alig…
View article: Multimodal Latent Language Modeling with Next-Token Diffusion
Multimodal Latent Language Modeling with Next-Token Diffusion Open
Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video). In this work, we propose Latent Language Modeling (LatentLM), which seamlessly inte…