22 81 32

HAODONG DUAN

KennyUTC

https://kennymckormick.github.io

AI & ML interests

Video Understanding; Multi-Modal Learning

Recent Activity

upvoted a paper about 15 hours ago

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

upvoted a paper about 1 month ago

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

upvoted a paper about 1 month ago

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

View all activity

Organizations

authored 3 papers about 2 months ago

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published Aug 28 • 140

SPARK: Synergistic Policy And Reward Co-Evolving Framework

Paper • 2509.22624 • Published Sep 26 • 17

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109

authored 6 papers 3 months ago

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Paper • 2505.23764 • Published May 29 • 3

Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings

Paper • 2506.04997 • Published Jun 5

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

authored a paper 7 months ago

Visual Agentic Reinforcement Fine-Tuning

Paper • 2505.14246 • Published May 20 • 32

authored 2 papers 8 months ago

MM-IFEngine: Towards Multimodal Instruction Following

Paper • 2504.07957 • Published Apr 10 • 35

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3 • 68

authored 4 papers 9 months ago

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

Paper • 2503.19990 • Published Mar 25 • 35

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Paper • 2503.14478 • Published Mar 18 • 48

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13 • 36

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 85

authored 2 papers 10 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 74

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published Feb 7 • 65

authored 2 papers 11 months ago

OCSampler: Compressing Videos to One Clip with Single-step Sampling

Paper • 2201.04388 • Published Jan 12, 2022

Redundancy Principles for MLLMs Benchmarks

Paper • 2501.13953 • Published Jan 20 • 29

HAODONG DUAN

AI & ML interests

Recent Activity

Organizations

KennyUTC's activity