Lecong Zhang
YOU?
Author Swipe
View article: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning Open
General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs) 1,2 and chain-of-thought (CoT) prompting 3 , have achieved considerabl…
View article: DHGRPO: Domain-Induced, Hierarchical Group Relative Policy Optimization
DHGRPO: Domain-Induced, Hierarchical Group Relative Policy Optimization Open
DHGRPO (Domain-Induced Hierarchical Group Relative Policy Optimization) is a mathematically grounded extension of Group Relative Policy Optimization (GRPO) that mitigates group-level failure modes in preference-based fine-tuning of large l…
View article: DeepSeek-V3 Technical Report
DeepSeek-V3 Technical Report Open
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attenti…
View article: Polystyrene Microplastics Postpone APAP-Induced Liver Injury through Impeding Macrophage Polarization
Polystyrene Microplastics Postpone APAP-Induced Liver Injury through Impeding Macrophage Polarization Open
Polystyrene microplastics (PS MPs) are micrometer-scale items degraded from plastics and have been detected in various organisms. PS MPs have been identified as causing cognitive, cardiac, intestinal, and hepatic damage. However, their rol…