Explanipedia

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Open

Bing Xue, Bingxuan Wang, Bingzheng Xu, Chen Dong, Chengqi Deng , et al. · 2025

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We intro…

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning Open

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang , et al. · 2025

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs) 1,2 and chain-of-thought (CoT) prompting 3 , have achieved considerabl…

DHGRPO: Domain-Induced, Hierarchical Group Relative Policy Optimization Open

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song , et al. · 2025

DHGRPO (Domain-Induced Hierarchical Group Relative Policy Optimization) is a mathematically grounded extension of Group Relative Policy Optimization (GRPO) that mitigates group-level failure modes in preference-based fine-tuning of large l…

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Open

Linli Yao, Yicheng Li, Yuancheng Wei, Lei Li, Shuhuai Ren , et al. · 2025

The rapid growth of online video platforms, particularly live streaming services, has created an urgent need for real-time video understanding systems. These systems must process continuous video streams and respond to user queries instant…

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Open

Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao , et al. · 2025

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving …

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Open

Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao , et al. · 2025

DeepSeek-V3 Technical Report Open

DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang , et al. · 2024

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attenti…

Temporal Reasoning Transfer from Text to Video Open

Lei Li, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An , et al. · 2024

Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships. While previous research attributed this limitat…

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models Open

Linli Yao, Lei Li, Shuhuai Ren, Lean Wang, Yuanxin Liu , et al. · 2024

The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains u…

Towards Codable Watermarking for Injecting Multi-bits Information to LLMs Open

Lean Wang, Wenkai Yang, Deli Chen, Hao Zhou, Yankai Lin , et al. · 2023

As large language models (LLMs) generate texts with increasing fluency and realism, there is a growing need to identify the source of texts to prevent the abuse of LLMs. Text watermarking techniques have proven reliable in distinguishing w…

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning Open

Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou , et al. · 2023

In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided conte…

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning Open

Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou , et al. · 2023

In-context learning (ICL) emerges as a promising capability of large language models (LLMs) by providing them with demonstration examples to perform diverse tasks. However, the underlying mechanism of how LLMs learn from the provided conte…

Gradient Knowledge Distillation for Pre-trained Language Models Open

Lean Wang, Lei Li, Xu Sun · 2022

Knowledge distillation (KD) is an effective framework to transfer knowledge from a large-scale teacher to a compact yet well-performing student. Previous KD practices for pre-trained language models mainly transfer knowledge by aligning in…

Lean Wang YOU? Author Swipe