Yiheng Xiong and Angela Dai
Official implementation for BMVC 2024 paper "PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images".
We propose a new approach for generating the probabilistic distribution of 3D shape reconstructions conditioned on a highly ambiguous RGB image, enabling multiple diverse sampled hypotheses during inference.
Please setup the environment using conda
:
conda env create -f pt43d.yaml
conda activate pt43d
- Synthetic Data
We render CAD models from ShapeNet to generate synthetic images to mimic real-world challenges including occlusion and field-of-view truncation. Synthetic training / validation data can be downloaded here. We follow the official splits provided by ShapeNet. For each CAD model, we create 21 renderings to capture varying degree of ambiguity. Each rendering can be mapped to potential multiple ground-truth CAD models. Specifically, *.txt
contains ground-truth CAD id(s) for *.png
, where *
is from 0 to 20.
- Real-World Data
We adopt real-world image data from ScanNet for our experiments. Training set can be downloaded here and validatioon set can be downloaded here. We generate per-instance images without background using masks provided by ground-truth annotations (training set) or neural machine ODISE (validation set). We align each instance image to ground-truth CAD model using annotations provided by Scan2CAD. For each category, the images are indexed from 0. Specifically, *.jpg
is the instance image without background, *.npy
contains the visible points, *.pt
contains resized tensor without normalization for the corresponding instance image, *.txt
contains ground-truth CAD id, *_mask.jpg
is the mask and *_original_image.txt
contains the path of the original image in ScanNet, where *
is from 0.
- First train the
P-VQ-VAE
onShapeNet
:
./launchers/train_pvqvae_snet.sh
- Then extract the code for each sample of ShapeNet (caching them for training the transformer):
./launchers/extract_pvqvae_snet.sh
- Train the probabilistic transformer to learn the shape distribution conditioned on an RGB image:
./launchers/train_pt43d.sh
-
PT43D being trained on synthetic training pairs.
-
PT43D being fine-tuned on real-world training pairs.
If you find this code helpful, please consider citing:
@article{xiong2024pt43d,
title={PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images},
author={Xiong, Yiheng and Dai, Angela},
journal={arXiv preprint arXiv:2405.11914},
year={2024}
}
This work was supported by the ERC Starting Grant SpatialSem (101076253). This code borrowed heavily from AutoSDF. Thanks for the efforts for making their code available!