PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images

[arXiv] [BibTex] [Video]

Yiheng Xiong and Angela Dai

Official implementation for BMVC 2024 paper "PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images".

We propose a new approach for generating the probabilistic distribution of 3D shape reconstructions conditioned on a highly ambiguous RGB image, enabling multiple diverse sampled hypotheses during inference.

Installation

Please setup the environment using conda:

conda env create -f pt43d.yaml
conda activate pt43d

Preparing the Data

Synthetic Data

We render CAD models from ShapeNet to generate synthetic images to mimic real-world challenges including occlusion and field-of-view truncation. Synthetic training / validation data can be downloaded here. We follow the official splits provided by ShapeNet. For each CAD model, we create 21 renderings to capture varying degree of ambiguity. Each rendering can be mapped to potential multiple ground-truth CAD models. Specifically, *.txt contains ground-truth CAD id(s) for *.png, where * is from 0 to 20.

Real-World Data

We adopt real-world image data from ScanNet for our experiments. Training set can be downloaded here and validatioon set can be downloaded here. We generate per-instance images without background using masks provided by ground-truth annotations (training set) or neural machine ODISE (validation set). We align each instance image to ground-truth CAD model using annotations provided by Scan2CAD. For each category, the images are indexed from 0. Specifically, *.jpg is the instance image without background, *.npy contains the visible points, *.pt contains resized tensor without normalization for the corresponding instance image, *.txt contains ground-truth CAD id, *_mask.jpg is the mask and *_original_image.txt contains the path of the original image in ScanNet, where * is from 0.

Training

First train the P-VQ-VAE on ShapeNet:

./launchers/train_pvqvae_snet.sh

Then extract the code for each sample of ShapeNet (caching them for training the transformer):

./launchers/extract_pvqvae_snet.sh

Train the probabilistic transformer to learn the shape distribution conditioned on an RGB image:

./launchers/train_pt43d.sh

Pretrained Checkpoints

P-VQ-VAE.
PT43D being trained on synthetic training pairs.
PT43D being fine-tuned on real-world training pairs.

Citation

If you find this code helpful, please consider citing:

@article{xiong2024pt43d,
  title={PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images},
  author={Xiong, Yiheng and Dai, Angela},
  journal={arXiv preprint arXiv:2405.11914},
  year={2024}
}

Acknowledgement

This work was supported by the ERC Starting Grant SpatialSem (101076253). This code borrowed heavily from AutoSDF. Thanks for the efforts for making their code available!

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
configs		configs
dataset_info_files		dataset_info_files
datasets		datasets
docs		docs
external/clip		external/clip
launchers		launchers
models		models
options		options
utils		utils
LICENSE		LICENSE
README.md		README.md
extract_code.py		extract_code.py
pt43d.yaml		pt43d.yaml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images

Installation

Preparing the Data

Training

Pretrained Checkpoints

Citation

Acknowledgement

About

Releases

Packages

Languages

License

xiongyiheng/PT43D

Folders and files

Latest commit

History

Repository files navigation

PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images

Installation

Preparing the Data

Training

Pretrained Checkpoints

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages