A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28 • 140
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26 • 17
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 109
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Paper • 2505.23764 • Published May 29 • 3
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings Paper • 2506.04997 • Published Jun 5
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 31
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs Paper • 2506.00991 • Published Jun 1
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 68
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25 • 35
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 48
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 36
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 74
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published Feb 7 • 65
OCSampler: Compressing Videos to One Clip with Single-step Sampling Paper • 2201.04388 • Published Jan 12, 2022