arXiv (Cornell University)
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
June 2025 • Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Y. H. Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nolan Dey, Yuanyuan Zha, Yi Gu, Kun Zhou, Haijun …
Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpus of 92K verifiable examples spanning six reasoning domains--Math, Code, Science, Logic, Simulation, and Tabular--each …