Exploring foci of:
arXiv (Cornell University)
SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM
April 2025 • Xiaojiang Zhang, Jinghui Wang, Zifei Cheng, Wenhao Zhuang, Zheng Lin, Minglei Zhang, Shaojie Wang, Y. Cui, Chao Wang, Junyi Peng, Shimiao Jiang, Shih…
Recent advances of reasoning models, exemplified by OpenAI's o1 and DeepSeek's R1, highlight the significant potential of Reinforcement Learning (RL) to enhance the reasoning capabilities of Large Language Models (LLMs). However, replicating these advancements across diverse domains remains challenging due to limited methodological transparency. In this work, we present two-Staged history-Resampling Policy Optimization (SRPO), which surpasses the performance of DeepSeek-R1-Zero-32B on the AIME24 and LiveCodeBench …
Learning Curve
Learning Theory (Education)
Experiential Learning
Practice (Learning Method)
Learning Environment
Machine Learning
Deep Learning
Reinforcement Learning
Learning Standards
Attention (Machine Learning)
Learning
Learning Disability
Decision Tree Learning
Q-Learning
Higher Learning
Learning To Crawl
List Of Datasets For Machine-Learning Research
Federated Learning
Ensemble Learning
Quantum Machine Learning
Cathedral Of Learning
Adversarial Machine Learning
Torch (Machine Learning)
Project-Based Learning