TVLAD
is a PyTorch implementation for our wacv-2023 paper "TransVLAD: Multi-Scale Attention-Based Global Descriptors for Visual Geo-Localization". This work was part of a project with SNCF. If you use this code for your research, please cite our paper. For additional questions contact us via yifanxu98@163.com or pshams55@gmail.com.
We test this repo with Python 3.8, PyTorch 1.9.0, and CUDA 10.2. However, it should be runnable with recent PyTorch versions (Pytorch >=1.1.0).
python setup.py develop
We test our models on three geo-localization benchmarks, Pittsburgh, Tokyo 24/7 and Tokyo Time Machine datasets. The three datasets can be downloaded at here.
The directory of datasets used is like
datasets/data
├── pitts
│ ├── raw
│ │ ├── pitts250k_test.mat
│ │ ├── pitts250k_train.mat
│ │ ├── pitts250k_val.mat
│ │ ├── pitts30k_test.mat
│ │ ├── pitts30k_train.mat
│ │ ├── pitts30k_val.mat
│ └── └── Pittsburgh
│ ├──images/
│ └──queries/
└── tokyo
├── raw
│ ├── tokyo247
│ │ ├──images/
│ │ └──query/
│ ├── tokyo247.mat
│ ├── tokyoTM/images/
│ ├── tokyoTM_train.mat
└── └── tokyoTM_val.mat
The file tree we used for storing the pre-trained weights is like
logs
├── mbv3_large.pth.tar # refer to (1)
└── mobilenetv3_large_pitts_64_desc_cen.hdf5 # refer to (2)
(1) ImageNet-pretrained weights for CNNs backbone
The ImageNet-pretrained weights for CNNs backbone or the pretrained weights for the model.
(2) initial cluster centers for VLAD layer
Note that the VLAD layer cannot work with random initialization. The original cluster centers provided by NetVLAD or self-computed cluster centers by running the scripts/cluster.sh.
bash scripts/cluster.sh mobilenetv3_large
Train by running script in the terminal. Script location: scripts/train_tvlad_dist.sh
Format:
bash scripts/train_tvlad_dist.sh arch
where, arch is the backbone name, such as mobilenetv3_large.
For example:
bash scripts/train_tvlad_dist.sh mobilenetv3_large
In the train_tvlad_dist.sh. In case you want to fasten testing, enlarge GPUS for more GPUs, or enlarge the --tuple-size for more tuples on one GPU. In case your GPU does not have enough memory, reduce --pos-num or --neg-num for fewer positives or negatives in one tuple.
Test by running script in the terminal. Script location: scripts/test_dist.sh
Format:
bash scripts/test_dist.sh resume arch dataset scale
where, resume is the trained model path. arch is the backbone name, such as mobilenetv3_large. dataset scale, such as pitts 30k, pitts 250k, tokyo.
For example:
- Test mobilenetv3_large on pitts 250k:
bash scripts/test_dist.sh logs/netVLAD/pitts30k-mobilenetv3_large/model_best.pth.tar mobilenetv3_large pitts 250k
In the test.sh. In case you want to fasten testing, enlarge GPUS for more GPUs, or enlarge the --test-batch-size on one GPU. In case your GPU does not have enough memory, reduce --test-batch-size on one GPU.