Collections
Discover the best community collections!
Collections including paper arxiv:2511.06221
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 60 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 52 -
Performance Trade-offs of Optimizing Small Language Models for E-Commerce
Paper • 2510.21970 • Published • 2
-
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 128 -
Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey
Paper • 2511.07448 • Published • 2 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 105
-
HuggingFaceTB/SmolLM3-3B
Text Generation • 3B • Updated • 90.8k • • 834 -
HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 202k • 2.48k -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 128 -
p-e-w/Llama-3.1-8B-Instruct-heretic
Text Generation • 8B • Updated • 941 • 6
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 4 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 128 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 144
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 32 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 8.01k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 141 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 128 -
Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey
Paper • 2511.07448 • Published • 2 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 105
-
HuggingFaceTB/SmolLM3-3B
Text Generation • 3B • Updated • 90.8k • • 834 -
HuggingFaceFW/fineweb
Viewer • Updated • 52.5B • 202k • 2.48k -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 128 -
p-e-w/Llama-3.1-8B-Instruct-heretic
Text Generation • 8B • Updated • 941 • 6
-
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Paper • 2510.20150 • Published • 4 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper • 2511.06221 • Published • 128 -
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Paper • 2508.10433 • Published • 144
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 60 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 52 -
Performance Trade-offs of Optimizing Small Language Models for E-Commerce
Paper • 2510.21970 • Published • 2
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 32 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 26
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 8.01k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 141 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63