Explanipedia

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning Open

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang , et al. · 2025

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs) 1,2 and chain-of-thought (CoT) prompting 3 , have achieved considerabl…

DHGRPO: Domain-Induced, Hierarchical Group Relative Policy Optimization Open

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song , et al. · 2025

DHGRPO (Domain-Induced Hierarchical Group Relative Policy Optimization) is a mathematically grounded extension of Group Relative Policy Optimization (GRPO) that mitigates group-level failure modes in preference-based fine-tuning of large l…

DeepSeek-V3 Technical Report Open

DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang , et al. · 2024

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attenti…

Polystyrene Microplastics Postpone APAP-Induced Liver Injury through Impeding Macrophage Polarization Open

Jing Liu, Lecong Zhang, Fang Xu, Songyan Meng, Haitian Li , et al. · 2022

Polystyrene microplastics (PS MPs) are micrometer-scale items degraded from plastics and have been detected in various organisms. PS MPs have been identified as causing cognitive, cardiac, intestinal, and hepatic damage. However, their rol…

Lecong Zhang YOU? Author Swipe