Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning Paper • 2510.27623 • Published Oct 31 • 12
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning Paper • 2510.24320 • Published Oct 28 • 18 • 3
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance Paper • 2502.12459 • Published Feb 18 • 2
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated Oct 17 • 1
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39 • 2
LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated Oct 17 • 1