Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Abstract
MoRe4D generates high-quality 4D scenes with multi-view consistency and dynamic details from a single image using a diffusion-based trajectory generator and depth-guided motion normalization.
Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geometry from motion, causing spatiotemporal inconsistencies and poor generalization. To address these, we extend the reconstruct-then-generate framework to jointly perform Motion generation and geometric Reconstruction for 4D Synthesis (MoRe4D). We first introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories, addressing the scarcity of high-quality 4D scene data. Based on this, we propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D point trajectories. To leverage single-view priors, we design a depth-guided motion normalization strategy and a motion-aware module for effective geometry and dynamics integration. We then propose a 4D View Synthesis Module (4D-ViSM) to render videos with arbitrary camera trajectories from 4D point track representations. Experiments show that MoRe4D generates high-quality 4D scenes with multi-view consistency and rich dynamic details from a single image. Code: https://github.com/Zhangyr2022/MoRe4D.
Community
Project Page: https://ivg-yanranzhang.github.io/MoRe4D/
Github Repo: https://github.com/Zhangyr2022/MoRe4D
The dataset is coming soon. Stay tuned!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models (2025)
- GeoVideo: Introducing Geometric Regularization into Video Generation Model (2025)
- SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis (2025)
- ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation (2025)
- AutoScape: Geometry-Consistent Long-Horizon Scene Generation (2025)
- 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation (2025)
- FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
