Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9 • 101
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 10 days ago • 83
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published 9 days ago • 52
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral Paper • 2512.04220 • Published 7 days ago • 11