✨SimRecon: SimReady Compositional Scene Reconstruction from Real Videos✨

Chong Xia^1,2,*, Kai Zhu^1,*, Zizhuo Wang¹, Fangfu Liu¹, Zhizheng Zhang², Yueqi Duan^1,†
¹Tsinghua University ²Galbot

CVPR 2026 🔥

SimRecon: We propose SimRecon,a novel compositional scene reconstruction framework that implements a "Perception-Generation-Simulation" pipeline with specialized bridging modules to ensure high visual fidelity and physical plausibility.

📢 News

🔥 [03/03/2026] We release "SimRecon: SimReady Compositional Scene Reconstruction from Real Videos". Check our project page and arXiv paper.
🔥 [03/03/2026] We release the code of our Active Viewpoint Optimization (AVO) module. Other parts of the project will be released soon.

🌟 Pipeline

The overall framework of our approach SimRecon. We propose a “Perception-Generation-Simulation” pipeline with object-centric scene representations towards compositional 3D scene reconstruction from cluttered video input. In this figure, we provide illustrative visualizations using the backpack as the example to introduce our two core modules: Active Viewpoint Optimization (AVO) and Scene Graph Synthesizer (SGS). There, we visualize a semantic-level graph for clarity, while our framework operates at the instance-level.

⚙️ Setup

1. Clone Repository

git clone https://github.com/xiac20/SimRecon.git
cd SimRecon

2. Environment Setup

Create conda environment

conda create -n simrecon python=3.9 -y
conda activate simrecon

Install dependencies

pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

pip install --extra-index-url=https://pypi.nvidia.com "cudf-cu11==24.2.*" "cuml-cu11==24.2.*"

pip install -r requirements.txt

Additional Setup

Install CropFormer for instance-level segmentation.

cd semantic_modules/CropFormer
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
pip install -r requirements.txt
pip install -U openmim
mim install mmcv
mkdir ckpts

Manually download CropFormer checkpoint into semantic_modules/CropFormer/ckpts

💻Data Preprocessing

Please follow the steps below to process your custom dataset, or directly download our preprocessed datasets.

1. Get sparse reconstruction from video input

Follow the original repository from COLMAP or HLOC to get sparse reconstruction results. For scenes with poor image quality and severe occlusion, we strongly recommend using HLOC or other state-of-the-art methods to complete sparse reconstruction.

2. Run instance-level segmentation.

cd semantic_modules/CropFormer
bash run_segmentation.sh "$DATA_DIR"
cd ../..

3. Training 2DGS.

python train_2dgs.py -s data/scene0000_00 -m output/scene0000_00

Put the trained point_cloud.ply file into the $DATA_DIR directory. After successfully executing the above steps, the data directory should be structured as follows:

data
   |——————scene0000_00
      |——————point_cloud.ply
      |——————images
         |——————0.jpg
         ...
      |——————sam
         |——————mask
            |——————0.png
            ...
      |——————sparse
         |——————0
            |——————cameras.bin
            ...

💻Run Examples

We provide three example scenes to help you get started.

1. Generate 3D semantic segmentation

python train_semantic.py -s data/scene0000_00 \
                         -m train_semanticgs \
                         --use_seg_feature --iterations 2500 \
                         --load_filter_segmap --consider_negative_labels

It's normal to get stuck at the DBScan Filter Stage, since the backgrount gaussian points may be divided into multi-regions.
Use --consider_negative_labels to suppress floaters during background segmentation.

2. Optimize best view by AVO

python optimize_by_avo.py --source_path data/scene0000_00 --label_dir output/data/scene0000_00/train_semanticgs/point_cloud/iteration_2500 --max_iterations 100

Use --instance_id to specify the optimized object number. If not specified all objects will be optimized.
Before optimization, you can view the point cloud of each object under output/data/scene0000_00/train_semanticgs/point_cloud/iteration_2500/label_pointclouds.
For situations where AVO results are not ideal, please adjust some hyperparameters, such as learning rates for rotation and translation, depth constraint coefficients, etc. The number of optimization rounds can also be adjusted up to 5000 rounds.
For cases where optimization fails due to artifact drift in the 2dgs output point cloud, check out some recent works dedicated to solving such problems.

🔗Acknowledgement

We are thankful for the following great works when implementing SimRecon:

2DGS, InstaScene, Rodin, SAM

📚Citation

@misc{xia2026simreconsimreadycompositionalscene,
  title={SimRecon: SimReady Compositional Scene Reconstruction from Real Videos}, 
  author={Chong Xia and Kai Zhu and Zizhuo Wang and Fangfu Liu and Zhizheng Zhang and Yueqi Duan},
  year={2026},
  eprint={2603.02133},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2603.02133}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
arguments		arguments
assets		assets
gaussian_renderer		gaussian_renderer
raytracing		raytracing
scene		scene
semantic_modules		semantic_modules
spatial_track		spatial_track
submodules		submodules
utils		utils
vis_utils		vis_utils
.gitignore		.gitignore
README.md		README.md
optimize_by_avo.py		optimize_by_avo.py
requirements.txt		requirements.txt
train_2dgs.py		train_2dgs.py
train_semantic.py		train_semantic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨SimRecon: SimReady Compositional Scene Reconstruction from Real Videos✨

CVPR 2026 🔥

📢 News

🌟 Pipeline

⚙️ Setup

1. Clone Repository

2. Environment Setup

💻Data Preprocessing

1. Get sparse reconstruction from video input

2. Run instance-level segmentation.

3. Training 2DGS.

💻Run Examples

1. Generate 3D semantic segmentation

2. Optimize best view by AVO

🔗Acknowledgement

📚Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

✨SimRecon: SimReady Compositional Scene Reconstruction from Real Videos✨

CVPR 2026 🔥

📢 News

🌟 Pipeline

⚙️ Setup

1. Clone Repository

2. Environment Setup

💻Data Preprocessing

1. Get sparse reconstruction from video input

2. Run instance-level segmentation.

3. Training 2DGS.

💻Run Examples

1. Generate 3D semantic segmentation

2. Optimize best view by AVO

🔗Acknowledgement

📚Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages