Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
JiaKui Hu, Shanshan Zhaoβ―, Qing-Guo Chen, Xuerui Qiu, Jialun Liu, Zhao Xu, Weihua Luo, Kaifu Zhang, Yanye Luβ―
Peking University, Alibaba International Digital Commerce Group
This paper presents Omni-View, which extends the unified multimodal understanding and generation to 3D scenes based on multiview images, exploring the principle that "generation facilitates understanding". Consisting of understanding model, texture module, and geometry module, Omni-View jointly models scene understanding, novel view synthesis, and geometry estimation, enabling synergistic interaction between 3D scene understanding and generation tasks. By design, it leverages the spatiotemporal modeling capabilities of its texture module responsible for appearance synthesis, alongside the explicit geometric constraints provided by its dedicated geometry module, thereby enriching the modelβs holistic understanding of 3D scenes. Trained with a two-stage strategy, Omni-View achieves a state-of-the-art score of 55.4 on the VSI-Bench benchmark, outperforming existing specialized 3D understanding models, while simultaneously delivering strong performance in both novel view synthesis and 3D scene generation.

π₯ Quick Start
1οΈβ£ Set up environment
git clone https://github.com/AIDC-AI/Omni-View.git
cd Omni-View
conda create -n omniview python=3.10 -y
conda activate omniview
pip install torch==2.6.0 torchvision # please following https://pytorch.org/get-started/previous-versions/
pip install -r requirements.txt
pip install flash_attn==2.7.4 --no-build-isolation
2οΈβ£ Download the pre-trained checkpoint of BAGEL and Omni-View
# BAGEL, configs and VAE
from huggingface_hub import snapshot_download
save_dir = "./pretrained_model"
repo_id = "ByteDance-Seed/BAGEL-7B-MoT"
cache_dir = save_dir + "/cache"
snapshot_download(cache_dir=cache_dir,
local_dir=save_dir,
repo_id=repo_id,
local_dir_use_symlinks=False,
resume_download=True,
allow_patterns=["*.json", "ae.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)
# Omni-View
huggingface-cli download AIDC-AI/Omni-View --local-dir ./
π₯ Eval
Eval
We provide the scripts for evaluating 3D scene understanding, Spatial Reasoning (VSI-bench), and Novel View Synthesis. Please See EVAL for more details.
π Benchmarks
1. 3D Scene Understanding

2. VSI-Bench

3. Novel View Synthesis

βοΈ Citation
If you find this work useful in your research, please consider citing:
@misc{hu2025omniview,
title={Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images},
author={JiaKui Hu and Shanshan Zhao and Qing-Guo Chen and Xuerui Qiu and Jialun Liu and Zhao Xu and Weihua Luo and Kaifu Zhang and Yanye Lu},
year={2025},
eprint={2511.07222},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.07222},
}
π§‘ Acknowledgements
Our implementation is built upon Bagel. We appreciate their great work.
π License
Copyright (C) 2025 AIDC-AI
Licensed under the Apache License, Version 2.0.
This project contains various third-party components under other open source licenses. You should respect the terms of those licenses.
The component DiT is released under the CC-BY-NC 4.0 License (for non-commercial purposes only).
See the NOTICE file for more information.