MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20 • 68
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26 • 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels Paper • 2508.17437 • Published Aug 20 • 38
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations Paper • 2509.09676 • Published Sep 11 • 33
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15 • 105
3D-LLM: Injecting the 3D World into Large Language Models Paper • 2307.12981 • Published Jul 24, 2023 • 38
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation Paper • 2510.08551 • Published Oct 9 • 33
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9 • 125
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Paper • 2510.19944 • Published Oct 22 • 19
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 177
Error-Driven Scene Editing for 3D Grounding in Large Language Models Paper • 2511.14086 • Published Nov 18 • 6
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13 • 95
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published Oct 31 • 28
NaTex: Seamless Texture Generation as Latent Color Diffusion Paper • 2511.16317 • Published Nov 20 • 15
MiMo-Embodied: X-Embodied Foundation Model Technical Report Paper • 2511.16518 • Published Nov 20 • 24
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model Paper • 2512.01030 • Published 25 days ago • 19
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Paper • 2512.03000 • Published 23 days ago • 36
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 22 days ago • 23
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image Paper • 2512.05044 • Published 21 days ago • 16
ProPhy: Progressive Physical Alignment for Dynamic World Simulation Paper • 2512.05564 • Published 21 days ago • 5
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos Paper • 2512.10881 • Published 14 days ago • 29
SS4D: Native 4D Generative Model via Structured Spacetime Latents Paper • 2512.14284 • Published 10 days ago • 13
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation Paper • 2512.16913 • Published 7 days ago • 33
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 9 days ago • 64
Towards Seamless Interaction: Causal Turn-Level Modeling of Interactive 3D Conversational Head Dynamics Paper • 2512.15340 • Published 9 days ago • 1
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published 7 days ago • 18
3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework Paper • 2512.17459 • Published 7 days ago • 11
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 7 days ago • 41
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 7 days ago • 71
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry Paper • 2512.18314 • Published 6 days ago • 7
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models Paper • 2512.19526 • Published 4 days ago • 8