GitHub - yxKryptonite/OpenFMNav: Official implementation of OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
NAACL 2024 Findings

This is the official repository of OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models.

Setup

Dataset Preparation

Please follow HM3DSem to download the dataset and prepare the data. The data format should be:

data/
├── objectgoal_hm3d/
│   ├── train/
│   ├── val/
│   └── val_mini/
├── scene_datasets/
│   └── hm3d/
│       ├── minival/
│       └── val/
├── versioned_data/
├── matterport_category_mappings.tsv
└── object_norm_inv_perplexity.npy

Checkpoints

Please checkout Grounded-SAM to download groundingdino_swint_ogc.pth and sam_vit_h_4b8939.pth and put them into Grounded_SAM/.

Dependencies

Python & PyTorch

This code is tested on Python 3.9.16 on Ubuntu 20.04, with PyTorch 1.11.0+cu113.

Habitat-Sim & Habitat-Lab

# Habitat-Sim
git clone https://github.com/facebookresearch/habitat-sim.git
cd habitat-sim; git checkout tags/challenge-2022; 
pip install -r requirements.txt; 
python setup.py install --headless

# Habitat-Lab
git clone https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab; git checkout tags/challenge-2022; 
pip install -e .

Grounded-SAM

Please checkout Grounded-SAM to install the dependencies.
Others
```
pip install -r requirements.txt
```

OpenAI API keys

You will need an OpenAI API key to use this repo. Please touch apikey.txt and paste your API key in the file.

Running

Example

An example command to run the pipeline:

CUDA_VISIBLE_DEVICES=0 python main.py --split val --eval 1 --auto_gpu_config 0 --prompt_type scoring \
-n 1 --num_eval_episodes 100 --text_threshold 0.55 --boundary_coeff 12 --start_episode 0 --tag_freq 100 \
--use_gtsem 0 --num_local_steps 20 --print_images 1 --exp_name test

Visualization

To make a demo video on your saved images, you can either use ffmpeg to make separate videos or use

python make_demo.py --exp_name test # add `--delete_img` to delete images after making video

to make batched videos.

Acknowledgements

This repo is heavily based on L3MVN. We thank the authors for their great work.

Citation

If you find this work helpful, please consider citing:

@inproceedings{kuang2024openfmnav,
    title={Open{FMN}av: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models},
    author={Yuxuan Kuang and Hai Lin and Meng Jiang},
    booktitle={2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Grounded_SAM		Grounded_SAM
agents		agents
envs		envs
img		img
utils		utils
vl_prompt		vl_prompt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
constants.py		constants.py
main.py		main.py
make_demo.py		make_demo.py
model.py		model.py
requirements.txt		requirements.txt

License

yxKryptonite/OpenFMNav

Folders and files

Latest commit

History

Repository files navigation

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models NAACL 2024 Findings

Setup

Dataset Preparation

Checkpoints

Dependencies

OpenAI API keys

Running

Example

Visualization

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Languages

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
NAACL 2024 Findings