Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

This repository contains the official implementation of "MixCon3D" in our paper.

Overview of the MixCon3D. We integrate the image and point cloud modality information, formulating a holistic 3D instance-level representation for cross-modal alignment.

News

[2023.11.2] We release the training code of the MixCon3D.

Installation

Please refer to this instruction for step-by-step installation guidance. Both the necessary packages and some helpful debug experience are provided.

Data Downloading

First, modify the path in the download_data.py. Then, execute the following command to download data from Hugging Face:

python3 download_data.py

The datasets used for experiments are the same as OpenShape. Please refer to OpenShape for more details of the data.

Training

To train the PointBERT model using the MixCon3D, please change the [data_path] to your specified local data path. Then, running the following command:

On an 8 GPU server, training with batchsize 2048

torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_batch_size=256 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

If the GPU memory is not enough, you can lower the train_batch_size and run with a specific accumulated iter num as follows (The equivalent batchsize = train_batch_size * accum_freq):

torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_batch_size=128 dataset.accum_freq=2 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

To train on different datasets, use the following command:

(Ensemble_no_LVIS)

torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_split=meta_data/split/train_no_lvis.json dataset.train_batch_size=128 dataset.accum_freq=2 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

(ShapeNet_only)

torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_split=meta_data/split/ablation/train_shapenet_only.json dataset.train_batch_size=128 dataset.accum_freq=1 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

We use the wandb for logging.

Acknowledgment

This codebase is based on OpenShape, timm and PointBERT. Thanks to the authors for their awesome contributions! This work is partially supported by TPU Research Cloud (TRC) program and Google Cloud Research Credits program.

Citation

@inproceedings{gao2023mixcon3d,
  title     = {Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training},
  author    = {Yipeng Gao and Zeyu Wang and Wei-Shi Zheng and Cihang Xie and Yuyin Zhou},
  booktitle = {CVPR},
  year      = {2024},
}

Contact

Questions and discussions are welcome via gaoyp23@mail2.sysu.edu.cn or open an issue here.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
figs		figs
src		src
Installation.md		Installation.md
README.md		README.md
download_data.py		download_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figs

figs

src

src

Installation.md

Installation.md

README.md

README.md

download_data.py

download_data.py

Repository files navigation

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

News

Installation

Data Downloading

Training

Acknowledgment

Citation

Contact

About

Releases

Packages

Contributors 2

Languages

UCSC-VLAA/MixCon3D

Folders and files

Latest commit

History

Repository files navigation

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

News

Installation

Data Downloading

Training

Acknowledgment

Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages