SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM

Exploring foci of: arXiv (Cornell University) SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM April 2025 • Xiaojiang Zhang, Jinghui Wang, Zifei Cheng, Wenhao Zhuang, Zheng Lin, Minglei Zhang, Shaojie Wang, Y. Cui, Chao Wang, Junyi Peng, Shimiao Jiang, Shih… Recent advances of reasoning models, exemplified by OpenAI's o1 and DeepSeek's R1, highlight the significant potential of Reinforcement Learning (RL) to enhance the reasoning capabilities of Large Language Models (LLMs). However, replicating these advancements across diverse domains remains challenging due to limited methodological transparency. In this work, we present two-Staged history-Resampling Policy Optimization (SRPO), which surpasses the performance of DeepSeek-R1-Zero-32B on the AIME24 and LiveCodeBench … Open Article Page

Learning Curve Learning Theory (Education) Experiential Learning Practice (Learning Method) Learning Environment Machine Learning Deep Learning Reinforcement Learning Learning Standards Open Article

Attention (Machine Learning) Learning Learning Disability Decision Tree Learning Q-Learning Higher Learning Learning To Crawl List Of Datasets For Machine-Learning Research Federated Learning Open Article

Ensemble Learning Quantum Machine Learning Cathedral Of Learning Adversarial Machine Learning Torch (Machine Learning) Project-Based Learning Open Article