This is the official repository for the NeurIPS 2024 paper "SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition".
This repo follows the framework of GSV-Cities for training, and the Visual Geo-localization Benchmark for evaluation. You can download the GSV-Cities datasets HERE, and refer to VPR-datasets-downloader to prepare test datasets.
The test dataset should be organized in a directory tree as such:
├── datasets_vg
└── datasets
└── pitts30k
└── images
├── train
│ ├── database
│ └── queries
├── val
│ ├── database
│ └── queries
└── test
├── database
└── queries
Before training, you should download the pre-trained foundation model DINOv2(ViT-B/14) HERE.
python3 train.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --foundation_model_path=/path/to/pre-trained/dinov2_vitb14_pretrain.pth --backbone=dino --supervlad_clusters=4 --crossimage_encoder --patience=3 --lr=0.00005 --epochs_num=20 --train_batch_size=120 --freeze_te=8
python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --resume=/path/to/trained/model/SuperVLAD.pth --backbone=dino --supervlad_clusters=4 --crossimage_encoder --infer_batch_size=8
Remove parameter --crossimage_encoder
to run the SuperVLAD without cross-image encoder.
Set --supervlad_clusters=1
and --ghost_clusters=2
to run the 1-cluster VLAD. For example,
python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=msls --resume=/path/to/trained/model/1-clusterVLAD.pth --backbone=dino --supervlad_clusters=1 --ghost_clusters=2
If you want to train models with Automatic Mixed Precision for faster training speed and less GPU memory usage. Just add parameter --mixed_precision
. In this case, the cross-image encoder is not optimized separately and may not perform well.
model | cross-image encoder |
download |
---|---|---|
SuperVLAD | ✅ | LINK |
SuperVLAD | ❌ | LINK |
1-ClusterVLAD | ❌ | LINK |
Parts of this repo are inspired by the following repositories:
Visual Geo-localization Benchmark
If you find this repo useful for your research, please cite the paper
@inproceedings{lu2024supervlad,
title={SuperVLAD: Compact and Robust Image Descriptors for Visual Place Recognition},
author={Lu, Feng and Zhang, Xinyao and Ye, Canming and Dong, Shuting and Zhang, Lijun and Lan, Xiangyuan and Yuan, Chun},
booktitle={Advances in Neural Information Processing Systems},
volume={37},
pages={5789--5816},
year={2024}
}