Furu Wei
YOU?
Author Swipe
View article: Think Only When You Need with Large Hybrid-Reasoning Models
Think Only When You Need with Large Hybrid-Reasoning Models Open
Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessive…
View article: Efficient RL Training for Reasoning Models via Length-Aware Optimization
Efficient RL Training for Reasoning Models via Length-Aware Optimization Open
Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten r…
View article: Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes
Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes Open
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spati…
View article: BitNet b1.58 2B4T Technical Report
BitNet b1.58 2B4T Technical Report Open
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering l…
View article: Model as a Game: On Numerical and Spatial Consistency for Generative Games
Model as a Game: On Numerical and Spatial Consistency for Generative Games Open
Recent advances in generative models have significantly impacted game generation. However, despite producing high-quality graphics and adequately receiving player input, existing models often fail to maintain fundamental game properties su…
View article: Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Open
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks. While current researches continue to explore the benefit…
View article: Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs Open
The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ternary LLMs remain scarce. To bridge t…
View article: mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data Open
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space. However, the limited labeled multimodal data often hinders…
View article: Examining False Positives under Inference Scaling for Mathematical Reasoning
Examining False Positives under Inference Scaling for Mathematical Reasoning Open
Recent advancements in language models have led to significant improvements in mathematical reasoning across various benchmarks. However, most of these benchmarks rely on automatic evaluation methods that only compare final answers using h…
View article: Chain-of-Retrieval Augmented Generation
Chain-of-Retrieval Augmented Generation Open
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before t…
View article: Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Open
Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet often rely on single-paradigm reasoning, limiting their effectiveness across diverse tasks. We introduce Chain-of-Reasoning (CoR), a novel unified frame…
View article: GeAR: Generation Augmented Retrieval
GeAR: Generation Augmented Retrieval Open
Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. However, such scalar similarity is diffi…
View article: PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Open
Geologic map, as a fundamental diagram in geology science, provides critical insights into the structure and composition of Earth's subsurface and surface. These maps are indispensable in various fields, including disaster detection, resou…
View article: GeAR: Generation Augmented Retrieval
GeAR: Generation Augmented Retrieval Open
Document retrieval techniques are essential for developing large-scale information systems. The common approach involves using a bi-encoder to compute the semantic similarity between a query and documents. However, the scalar similarity of…
View article: MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark Open
Multiple-choice question (MCQ) datasets like Massive Multitask Language Understanding (MMLU) are widely used to evaluate the commonsense, understanding, and problem-solving abilities of large language models (LLMs). However, the open-sourc…
View article: Context-DPO: Aligning Language Models for Context-Faithfulness
Context-DPO: Aligning Language Models for Context-Faithfulness Open
Reliable responses from large language models (LLMs) require adherence to user instructions and retrieved information. While alignment techniques help LLMs align with human intentions and values, improving context-faithfulness through alig…
View article: Multimodal Latent Language Modeling with Next-Token Diffusion
Multimodal Latent Language Modeling with Next-Token Diffusion Open
Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video). In this work, we propose Latent Language Modeling (LatentLM), which seamlessly inte…
View article: Preference Optimization for Reasoning with Pseudo Feedback
Preference Optimization for Reasoning with Pseudo Feedback Open
Preference optimization techniques, such as Direct Preference Optimization (DPO), are frequently employed to enhance the reasoning capabilities of large language models (LLMs) in domains like mathematical reasoning and coding, typically fo…
View article: MH-MoE: Multi-Head Mixture-of-Experts
MH-MoE: Multi-Head Mixture-of-Experts Open
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the multi-head mechanism to collectively attend to information from various representation spaces within different experts. In this paper, we present a novel…
View article: BitNet a4.8: 4-bit Activations for 1-bit LLMs
BitNet a4.8: 4-bit Activations for 1-bit LLMs Open
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4…