Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Project Page | Paper

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Yiming Xie, Huaizu Jiang, Georgia Gkioxari*, Julian Straub*
ICCV 2023

How to use

Installation

conda env create -f environment.yml

Pretrained Model on ScanNet

Download the pretrained weights and put it under PROJECT_PATH/checkpoint/. You can also use gdown to download it in command line:

gdown --id 1FuIf1jDPX-ooOx0x-tS69ejhdn9NFuXz

Data Preperation for ScanNet

Download and extract ScanNet by following the instructions provided at http://www.scan-net.org/.

[Expected directory structure of ScanNet (click to expand)]

You can obtain the train/val/test split information from here.

PROJECT_PATH
└───data
|   └───scannet
|   │   └───scans
|   │   |   └───scene0000_00
|   │   |       └───color
|   │   |       │   │   0.jpg
|   │   |       │   │   1.jpg
|   │   |       │   │   ...
|   │   |       │   ...
|   │   └───scans_raw
|   │   |   └───scene0000_00
|   │   |       └───scene0000_00.aggregation.json
|   │   |       └───scene0000_00_vh_clean_2.labels.ply
|   │   |       └───scene0000_00_vh_clean_2.0.010000.segs.json
|   │   |       │   ...
|   |   └───scannetv2_test.txt
|   |   └───scannetv2_train.txt
|   |   └───scannetv2_val.txt
|   |   └───scannetv2-labels.combined.tsv

Next download the generated oriented boxes annotations and put it under PROJECT_PATH/data/scannet/

OR you can run the data preparation script by yourself.

Inference on ScanNet val-set

python eval.py --cfg ./config/eval.yaml CHECKPOINT_PATH ./checkpoint/parq_release.ckpt

Training on ScanNet

Training with 8 gpus:

python train.py --cfg ./config/train.yaml TRAINER.GPUS 8

Real-time Demo on Custom Data with Camera Poses from ARKit.

We provide a demo of PARQ running with self-captured ARKit data. Please refer to DEMO.md for details about capturing and processing the data. We also provide the example data captured using iPhoneXR.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{xie2023parq,
  title={Pixel-Aligned Recurrent Queries for Multi-View {3D} Object Detection},
  author={Xie, Yiming and Jiang, Huaizu and Gkioxari, Georgia and Straub, Julian},
  booktitle={ICCV},
  year={2023}
}

License

The majority of PARQ is relased under the MIT License. LICENSE-MIT file is for file model/transformer_parq.py. LICENSE file is for other files.

Acknowledgment

We want to thank the following contributors that our code is based on: DETR, VoteNet, RotationContinuity, Pixloc .

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
config		config
data		data
datasets		datasets
model		model
scripts/scannet_preprocessing		scripts/scannet_preprocessing
utils		utils
.gitignore		.gitignore
DEMO.md		DEMO.md
LICENSE		LICENSE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
train.py		train.py

License

Licenses found

ymingxie/PARQ

Folders and files

Latest commit

History

Repository files navigation

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Project Page | Paper

How to use

Installation

Pretrained Model on ScanNet

Data Preperation for ScanNet

Inference on ScanNet val-set

Training on ScanNet

Real-time Demo on Custom Data with Camera Poses from ARKit.

Citation

License

Acknowledgment

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages