GitHub - wysoczanska/clip_dinoiser: Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.

CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation

Monika Wysoczańska Oriane Siméoni Michaël Ramamonjisoa Andrei Bursuc Tomasz Trzciński Patrick Pérez

running_dog.2.mp4

Official PyTorch implementation of CLIP-DINOiser: Teaching CLIP a few DINO tricks.

@article{wysoczanska2023clipdino,
        title={CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation},
        author={Wysocza{\'n}ska, Monika and Sim{\'e}oni, Oriane and Ramamonjisoa, Micha{\"e}l and Bursuc, Andrei and Trzci{\'n}ski, Tomasz and P{\'e}rez, Patrick},
        journal={arxiv},
        year={2023}
}

Updates

[27/03/2023] Training code out. Updated weights to the ImageNet trained. Modified MaskCLIP code to directly load weights from OpenCLIP model.
[20/12/2023] Code release

Demo

Try our model!

Requirements

Set up the environment:

# Create conda environment
conda create -n clipdino python=3.9
conda activate clipdino
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=[your CUDA version] -c pytorch
pip install -r requirements.txt

You will also need to install MMCV and MMSegmentation by running:

pip install -U openmim
mim install mmengine    
mim install "mmcv-full==1.6.0"
mim install "mmsegmentation==0.27.0

Running from the notebook

You can try our model through jupyter notebook demo.ipynb.

Running from command line

You can also try our demo from the command line:

python demo.py --file_path [path to the image file] --prompts [list of the text prompts separated by ',']

Example:

python demo.py --file_path assets/rusted_van.png --prompts "rusted van,foggy clouds,mountains,green trees"

Reproducing results

Dataset preparation

In the paper, following previous works, we use 8 benchmarks; (i) w/ background: PASCAL VOC20, PASCAL Context59, and COCO-Object, and (ii) w/o background: PASCAL VOC, PASCAL Context, COCO-Stuff, Cityscapes, and ADE20k.

To run the evaluation, download and set up PASCAL VOC, PASCAL Context, COCO-Stuff164k, Cityscapes, and ADE20k datasets following MMSegmentation data preparation document.

COCO Object

COCO-Object dataset uses only object classes from COCO-Stuff164k dataset by collecting instance segmentation annotations. Run the following command to convert instance segmentation annotations to semantic segmentation annotations:

python tools/convert_coco.py data/coco_stuff164k/ -o data/coco_stuff164k/

Running evaluation

In order to reproduce our results simply run:

torchrun main_eval.py clip_dinoiser.yaml

or using multiple GPUs:

CUDA_VISIBLE_DEVICES=[0,1..] torchrun --nproc_per_node=auto main_eval.py clip_dinoiser.yaml

Training

Hardware Requirements: you'll need one gpu (~14GB) to run the training. Using NVIDIA GPU A5000 training takes approximately 3 hours.

Dataset preparation

Download ImageNet and update the ImageNet folder path in the configs/clip_dinoiser.yaml file.

Install FOUND

Install FOUND by running:

cd models;
git clone git@github.com:valeoai/FOUND.git
cd FOUND;
git clone https://github.com/facebookresearch/dino.git
cd dino; 
touch __init__.py
echo -e "import sys\nfrom os.path import dirname, join\nsys.path.insert(0, join(dirname(__file__), '.'))" >> __init__.py; cd ../;

Run training

To run the training simply run:

CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=auto train.py clip_dinoiser.yaml

Currently, we only support single gpu training.

Acknowledgments

This repo heavily relies on the following projects:

Thanks to the authors!

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
checkpoints		checkpoints
configs		configs
datasets		datasets
helpers		helpers
models		models
segmentation		segmentation
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
demo.py		demo.py
main_eval.py		main_eval.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
train.py		train.py

License

wysoczanska/clip_dinoiser

Folders and files

Latest commit

History

Repository files navigation

CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation Monika Wysoczańska Oriane Siméoni Michaël Ramamonjisoa Andrei Bursuc Tomasz Trzciński Patrick Pérez running_dog.2.mp4

Demo

Requirements

Running from the notebook

Running from command line

Reproducing results

Dataset preparation

COCO Object

Running evaluation

Training

Dataset preparation

Install FOUND

Run training

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages

CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation

Monika Wysoczańska Oriane Siméoni Michaël Ramamonjisoa Andrei Bursuc Tomasz Trzciński Patrick Pérez

running_dog.2.mp4