Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published 15 days ago • 45
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 25 days ago • 93
MathArena Benchmark Collection Competitions that are in the MathArena benchmark and on the website. • 17 items • Updated 18 days ago • 2
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 32
Running 3.6k The Ultra-Scale Playbook 🌌 3.6k The ultimate guide to training LLM on large GPU Clusters