Skip to content

Official repository for the NeurIPS 2024 paper "SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition".

License

Notifications You must be signed in to change notification settings

Lu-Feng/SuperVLAD

Repository files navigation

SuperVLAD

This is the official repository for the NeurIPS 2024 paper "SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition".

Getting Started

This repo follows the framework of GSV-Cities for training, and the Visual Geo-localization Benchmark for evaluation. You can download the GSV-Cities datasets HERE, and refer to VPR-datasets-downloader to prepare test datasets.

The test dataset should be organized in a directory tree as such:

├── datasets_vg
    └── datasets
        └── pitts30k
            └── images
                ├── train
                │   ├── database
                │   └── queries
                ├── val
                │   ├── database
                │   └── queries
                └── test
                    ├── database
                    └── queries

Before training, you should download the pre-trained foundation model DINOv2(ViT-B/14) HERE.

Train

python3 train.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --foundation_model_path=/path/to/pre-trained/dinov2_vitb14_pretrain.pth --backbone=dino --supervlad_clusters=4 --crossimage_encoder --patience=3 --lr=0.00005 --epochs_num=20 --train_batch_size=120 --freeze_te=8

Test

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --resume=/path/to/trained/model/SuperVLAD.pth --backbone=dino --supervlad_clusters=4 --crossimage_encoder --infer_batch_size=8

SuperVLAD without cross-image encoder

Remove parameter --crossimage_encoder to run the SuperVLAD without cross-image encoder.

1-cluster VLAD

Set --supervlad_clusters=1 and --ghost_clusters=2 to run the 1-cluster VLAD. For example,

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --resume=/path/to/trained/model/1-clusterVLAD.pth --backbone=dino --supervlad_clusters=1 --ghost_clusters=2

Training with Automatic Mixed Precision

If you want to train models with Automatic Mixed Precision for faster training speed and less GPU memory usage. Just add parameter --mixed_precision. In this case, the cross-image encoder is not optimized separately and may not perform well.

Trained Model

model cross-image
encoder
download
SuperVLAD LINK
SuperVLAD LINK
1-ClusterVLAD LINK

Acknowledgements

Parts of this repo are inspired by the following repositories:

GSV-Cities

Visual Geo-localization Benchmark

DINOv2

Citation

If you find this repo useful for your research, please cite the paper

@inproceedings{lu2024supervlad,
  title={SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition},
  author={Lu, Feng and Zhang, Xinyao and Ye, Canming and Dong, Shuting and Zhang, Lijun and Lan, Xiangyuan and Yuan, Chun},
  booktitle={Advances in Neural Information Processing Systems},
  volume={37},
  pages={5789--5816},
  year={2024}
}

About

Official repository for the NeurIPS 2024 paper "SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages