SuperVLAD

This is the official repository for the NeurIPS 2024 paper "SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition".

Getting Started

This repo follows the framework of GSV-Cities for training, and the Visual Geo-localization Benchmark for evaluation. You can download the GSV-Cities datasets HERE, and refer to VPR-datasets-downloader to prepare test datasets.

The test dataset should be organized in a directory tree as such:

├── datasets_vg
    └── datasets
        └── pitts30k
            └── images
                ├── train
                │   ├── database
                │   └── queries
                ├── val
                │   ├── database
                │   └── queries
                └── test
                    ├── database
                    └── queries

Before training, you should download the pre-trained foundation model DINOv2(ViT-B/14) HERE.

Train

python3 train.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --foundation_model_path=/path/to/pre-trained/dinov2_vitb14_pretrain.pth --backbone=dino --supervlad_clusters=4 --crossimage_encoder --patience=3 --lr=0.00005 --epochs_num=20 --train_batch_size=120 --freeze_te=8

Test

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --resume=/path/to/trained/model/SuperVLAD.pth --backbone=dino --supervlad_clusters=4 --crossimage_encoder --infer_batch_size=8

SuperVLAD without cross-image encoder

Remove parameter --crossimage_encoder to run the SuperVLAD without cross-image encoder.

1-cluster VLAD

Set --supervlad_clusters=1 and --ghost_clusters=2 to run the 1-cluster VLAD. For example,

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --resume=/path/to/trained/model/1-clusterVLAD.pth --backbone=dino --supervlad_clusters=1 --ghost_clusters=2

Training with Automatic Mixed Precision

If you want to train models with Automatic Mixed Precision for faster training speed and less GPU memory usage. Just add parameter --mixed_precision. In this case, the cross-image encoder is not optimized separately and may not perform well.

Trained Model

model	cross-image encoder	download
SuperVLAD	✅	LINK
SuperVLAD	❌	LINK
1-ClusterVLAD	❌	LINK

Acknowledgements

Parts of this repo are inspired by the following repositories:

GSV-Cities

Visual Geo-localization Benchmark

DINOv2

Citation

If you find this repo useful for your research, please cite the paper

@inproceedings{lu2024supervlad,
  title={SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition},
  author={Lu, Feng and Zhang, Xinyao and Ye, Canming and Dong, Shuting and Zhang, Lijun and Lan, Xiangyuan and Yuan, Chun},
  booktitle={Advances in Neural Information Processing Systems},
  volume={37},
  pages={5789--5816},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataloaders		dataloaders
image		image
model		model
LICENSE		LICENSE
README.md		README.md
commons.py		commons.py
datasets_ws.py		datasets_ws.py
eval.py		eval.py
parser.py		parser.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SuperVLAD

Getting Started

Train

Test

SuperVLAD without cross-image encoder

1-cluster VLAD

Training with Automatic Mixed Precision

Trained Model

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Lu-Feng/SuperVLAD

Folders and files

Latest commit

History

Repository files navigation

SuperVLAD

Getting Started

Train

Test

SuperVLAD without cross-image encoder

1-cluster VLAD

Training with Automatic Mixed Precision

Trained Model

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages