Skip to content

The code for the paper "High-Level Adaptive Feature Enhancement and Attention Mask-Guided Aggregation for Visual Place Recognition"

License

Notifications You must be signed in to change notification settings

wlh-coder/HAM-VPR

Repository files navigation

HAM-VPR

This is the official repository for the paper "High-Level Adaptive Feature Enhancement and Attention Mask-Guided Aggregation for Visual Place Recognition".

Summary

HAM-VPR is an enhanced Visual Position Recognition (VPR) framework designed to improve robustness against challenges like dynamic occlusion and viewpoint variations. Key innovations include:

  1. High-Level Adaptive Feature Enhancement
    • Integrates a lightweight AdapterFormer module into DINOv2's Transformer Block to enhance semantic adaptability while preserving fine-grained features.
    • Reduces parameter redundancy and generates structured segmentation feature maps, bridging the gap between pre-trained models and VPR tasks.
  2. Attention Mask-Guided Aggregation
    • A lightweight attention module generates implicit masks to guide global feature aggregation, suppressing irrelevant regions and amplifying discriminative areas.
    • Two-stage training ensures seamless fusion of mask and segmentation features without re-extracting base features.
  3. Dataset & Validation
    • Introduces the VPR-City-Mask dataset (derived from GSV-City) with region annotations for real-world mask validation.
    • Achieves state-of-the-art performance on multiple VPR benchmarks, demonstrating scalability and robustness.

Dataset

The dataset should be organized in a directory tree as such:

├── datasets_vpr
    └── datasets
        └── VPR-City-Mask
            └── images
                └── train
                   ├── database
                   ├── database_mask
                   ├── queries
                   └── queries_mask
                └── val
                   ├── database
                   ├── database_mask
                   ├── queries
                   └── queries_mask
                └── test
                   ├── database
                   ├── database_mask
                   ├── queries
                   └── queries_mask

We used the pre-trained foundation model DINOv2 (ViT-L/14) (HERE) as the basis for fine-tuning training .

Performance results of trained models

The model finetuned on VPR-City-Mask (for diverse scenes).

Pitts30k-test Pitts250k-test MSLS-val Tokyo24/7 SF-XL-testv1
R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10
89.7 95.9 96.6 93.7 98.2 98.6 83.6 93.0 95.0 85.6 92.2 94.3 76.9 83.6 80.5

Test

Set rerank_num=100 to reproduce the results in paper, and set rerank_num=20 to achieve a close result with only 1/5 re-ranking runtime (0.018s for a query).

python3 eval.py --datasets_folder=/path/to/your/datasets_vg/datasets --dataset_name=pitts30k --resume=./weight/HAM-VPR.pth

Acknowledgements

Parts of this repo are inspired by the following repositories:

DINOv2

SegFormer

Citation

If you find this repo useful for your research, please consider leaving a star⭐️ and citing the paper

@inproceedings{HAM-VPR,
  title={High-Level Adaptive Feature Enhancement and Attention Mask-Guided Aggregation for Visual Place Recognition},
  author={Wang Longhao, Lan Chaozhen, Wu Beibei, Yao Fushan, Wei Zijun, Gao Tian, Yu Hanyang},
  booktitle={***},
  year={2025}
}

About

The code for the paper "High-Level Adaptive Feature Enhancement and Attention Mask-Guided Aggregation for Visual Place Recognition"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages