This repo is the official implementation for the paper NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning in ICCV 2025.
- [2025/06/28] 🔥 NAVER code is open sourced in GitHub.
- [2025/06/25] 🎉 NAVER paper is accepted by ICCV 2025.
We're working on the following TODOs:
- GUI demo.
- Support more LLMs.
- Video demo & slides presentation.
- Python >= 3.10
- conda
Please follow the instructions below to install the required packages and set up the environment.
git clone https://github.com/ControlNet/NAVER
Option 1: Using pixi (recommended):
pixi install
pixi shell
Option 2: Building from source (You may need to setup the CUDA and PyTorch manually):
conda install conda-forge/label/rust_dev::rust=1.78 -c conda-forge -y
pip install "git+https://github.com/scallop-lang/scallop.git@f8fac18#egg=scallopy&subdirectory=etc/scallopy"
pip install -e .
Edit the file .env
or setup in CLI to configure the environment variables.
OPENAI_API_KEY=your-api-key # if you want to use OpenAI LLMs
AZURE_OPENAI_URL= # if you want to use Azure OpenAI LLMs
OLLAMA_HOST=http://ollama.server:11434 # if you want to use your OLLaMA server for llama or deepseek
# do not change this TORCH_HOME variable
TORCH_HOME=./pretrained_models
Run the scripts to download the pretrained models to the ./pretrained_models
directory.
python -m hydra_vl4ai.download_model --base_config config/refcoco.yaml --model_config config/model_config.yaml --extra_packages naver.tool
You may need 28GB vRAM to run NAVER. Consider editing the file in ./config/model_config.yaml
to load the models in multiple GPUs.
python demo_cli.py \
--image <IMAGE_PATH> \
--query <QUERY> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-CONFIG-PATH>
The result will be printed in the console.
python main.py \
--data_root <YOUR-DATA-ROOT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-CONFIG-PATH>
Then the inference results are saved in the ./result
directory for evaluation.
python evaluate.py --input <RESULT_JSONL_PATH>
The evaluation results will be printed in the console. Note the output from LLM is random, so the evaluation results may be slightly different from the paper.
If you find this work useful for your research, please consider citing it.
@article{cai2025naver,
title = {NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning},
author = {Cai, Zhixi and Ke, Fucai and Jahangard, Simindokht and Garcia de la Banda, Maria and Haffari, Reza and Stuckey, Peter J. and Rezatofighi, Hamid},
journal = {arXiv preprint arXiv:2502.00372},
year = {2025},
}