Chong Xia1,2,*,
Kai Zhu1,*,
Zizhuo Wang1,
Fangfu Liu1,
Zhizheng Zhang2,
Yueqi Duan1,†
1Tsinghua University
2Galbot
SimRecon: We propose SimRecon,a novel compositional scene reconstruction framework that implements a "Perception-Generation-Simulation" pipeline with specialized bridging modules to ensure high visual fidelity and physical plausibility.
- 🔥 [03/03/2026] We release "SimRecon: SimReady Compositional Scene Reconstruction from Real Videos". Check our project page and arXiv paper.
- 🔥 [03/03/2026] We release the code of our Active Viewpoint Optimization (AVO) module. Other parts of the project will be released soon.
The overall framework of our approach SimRecon. We propose a “Perception-Generation-Simulation” pipeline with object-centric scene representations towards compositional 3D scene reconstruction from cluttered video input. In this figure, we provide illustrative visualizations using the backpack as the example to introduce our two core modules: Active Viewpoint Optimization (AVO) and Scene Graph Synthesizer (SGS). There, we visualize a semantic-level graph for clarity, while our framework operates at the instance-level.
git clone https://github.com/xiac20/SimRecon.git
cd SimRecon- Create conda environment
conda create -n simrecon python=3.9 -y
conda activate simrecon - Install dependencies
pip install torch==2.1.0+cu118 torchvision==0.16.0+cu118 torchaudio==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install --extra-index-url=https://pypi.nvidia.com "cudf-cu11==24.2.*" "cuml-cu11==24.2.*"
pip install -r requirements.txt- Additional Setup
Install CropFormer for instance-level segmentation.
cd semantic_modules/CropFormer
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
pip install -r requirements.txt
pip install -U openmim
mim install mmcv
mkdir ckptsManually
download CropFormer checkpoint
into semantic_modules/CropFormer/ckpts
Please follow the steps below to process your custom dataset, or directly download our preprocessed datasets.
Follow the original repository from COLMAP or HLOC to get sparse reconstruction results. For scenes with poor image quality and severe occlusion, we strongly recommend using HLOC or other state-of-the-art methods to complete sparse reconstruction.
cd semantic_modules/CropFormer
bash run_segmentation.sh "$DATA_DIR"
cd ../..python train_2dgs.py -s data/scene0000_00 -m output/scene0000_00Put the trained point_cloud.ply file into the $DATA_DIR directory. After successfully executing the above steps, the
data directory should be structured as follows:
data
|——————scene0000_00
|——————point_cloud.ply
|——————images
|——————0.jpg
...
|——————sam
|——————mask
|——————0.png
...
|——————sparse
|——————0
|——————cameras.bin
...
We provide three example scenes to help you get started.
python train_semantic.py -s data/scene0000_00 \
-m train_semanticgs \
--use_seg_feature --iterations 2500 \
--load_filter_segmap --consider_negative_labels- It's normal to get stuck at the
DBScan Filter Stage, since the backgrount gaussian points may be divided into multi-regions. - Use
--consider_negative_labelsto suppress floaters during background segmentation.
python optimize_by_avo.py --source_path data/scene0000_00 --label_dir output/data/scene0000_00/train_semanticgs/point_cloud/iteration_2500 --max_iterations 100- Use
--instance_idto specify the optimized object number. If not specified all objects will be optimized. - Before optimization, you can view the point cloud of each object under output/data/scene0000_00/train_semanticgs/point_cloud/iteration_2500/label_pointclouds.
- For situations where AVO results are not ideal, please adjust some hyperparameters, such as learning rates for rotation and translation, depth constraint coefficients, etc. The number of optimization rounds can also be adjusted up to 5000 rounds.
- For cases where optimization fails due to artifact drift in the 2dgs output point cloud, check out some recent works dedicated to solving such problems.
We are thankful for the following great works when implementing SimRecon:
- 2DGS, InstaScene, Rodin, SAM
@misc{xia2026simreconsimreadycompositionalscene,
title={SimRecon: SimReady Compositional Scene Reconstruction from Real Videos},
author={Chong Xia and Kai Zhu and Zizhuo Wang and Fangfu Liu and Zhizheng Zhang and Yueqi Duan},
year={2026},
eprint={2603.02133},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.02133},
}
