HAM-VPR

This is the official repository for the paper "High-Level Adaptive Feature Enhancement and Attention Mask-Guided Aggregation for Visual Place Recognition".

Summary

HAM-VPR is an enhanced Visual Position Recognition (VPR) framework designed to improve robustness against challenges like dynamic occlusion and viewpoint variations. Key innovations include:

High-Level Adaptive Feature Enhancement
- Integrates a lightweight AdapterFormer module into DINOv2's Transformer Block to enhance semantic adaptability while preserving fine-grained features.
- Reduces parameter redundancy and generates structured segmentation feature maps, bridging the gap between pre-trained models and VPR tasks.
Attention Mask-Guided Aggregation
- A lightweight attention module generates implicit masks to guide global feature aggregation, suppressing irrelevant regions and amplifying discriminative areas.
- Two-stage training ensures seamless fusion of mask and segmentation features without re-extracting base features.
Dataset & Validation
- Introduces the VPR-City-Mask dataset (derived from GSV-City) with region annotations for real-world mask validation.
- Achieves state-of-the-art performance on multiple VPR benchmarks, demonstrating scalability and robustness.

Dataset

The dataset should be organized in a directory tree as such:

├── datasets_vpr
    └── datasets
        └── VPR-City-Mask
            └── images
                └── train
                   ├── database
                   ├── database_mask
                   ├── queries
                   └── queries_mask
                └── val
                   ├── database
                   ├── database_mask
                   ├── queries
                   └── queries_mask
                └── test
                   ├── database
                   ├── database_mask
                   ├── queries
                   └── queries_mask

We used the pre-trained foundation model DINOv2 (ViT-L/14) (HERE) as the basis for fine-tuning training .

Performance results of trained models

The model finetuned on VPR-City-Mask (for diverse scenes).

Pitts30k-test			Pitts250k-test			MSLS-val			Tokyo24/7			SF-XL-testv1
R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10
89.7	95.9	96.6	93.7	98.2	98.6	83.6	93.0	95.0	85.6	92.2	94.3	76.9	83.6	80.5

Test

Set rerank_num=100 to reproduce the results in paper, and set rerank_num=20 to achieve a close result with only 1/5 re-ranking runtime (0.018s for a query).

python3 eval.py --datasets_folder=/path/to/your/datasets_vg/datasets --dataset_name=pitts30k --resume=./weight/HAM-VPR.pth

Acknowledgements

Parts of this repo are inspired by the following repositories:

DINOv2

SegFormer

Citation

If you find this repo useful for your research, please consider leaving a star⭐️ and citing the paper

@inproceedings{HAM-VPR,
  title={High-Level Adaptive Feature Enhancement and Attention Mask-Guided Aggregation for Visual Place Recognition},
  author={Wang Longhao, Lan Chaozhen, Wu Beibei, Yao Fushan, Wei Zijun, Gao Tian, Yu Hanyang},
  booktitle={***},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backbone		backbone
image		image
LICENSE		LICENSE
Parameter_statistics.py		Parameter_statistics.py
README.md		README.md
commons.py		commons.py
datasets_ws.py		datasets_ws.py
eval.py		eval.py
network.py		network.py
parser.py		parser.py
requirements.txt		requirements.txt
test.py		test.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAM-VPR

Summary

Dataset

Performance results of trained models

Test

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

License

wlh-coder/HAM-VPR

Folders and files

Latest commit

History

Repository files navigation

HAM-VPR

Summary

Dataset

Performance results of trained models

Test

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages