SLIP

Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP). [Paper] (https://arxiv.org/abs/2405.07284)

Goal

The goal of the project is to enhance the capabilities of the SAM (Segment Anything Model 1) model by incorporating text prompts using CLIP (Contrastive Language-Image Pretraining 2). This integration, known as SLIP (SAM with CLIP), aims to enable object segmentation without the need for prior training on specific classes or categories.

Our Proposed Architecture

Citation

If you use this code or data in your research, please cite the following paper:

@misc{gundavarapu2024zero,
    title={Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)},
    author={Saaketh Koundinya Gundavarapu and Arushi Arora and Shreya Agarwal},
    year={2024},
    eprint={2405.07284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Repository Structure

SLIP demo/
- zero_shot_finetuned.ipynb SLIP - Zero shot segmentation demo after finetuning CLIP.
- zero_shot_pretrained.ipynb - SLIP - Zero shot segmentation demo using pretrained CLIP.
assests - Contains images for plots, model architecture, and test images.
baseline classifier/
- classifier output/
  - ResNet18_pokemon_output - text file - output after training ResNet18 on pokemon dataset.
  - VGG_pokemon_output - text file - output after training VGG on pokemon dataset.
- models/
  - ResNet18.py - ResNet18 model.
  - VGG.py - VGG model.
- run_resnet.sbatch - script to train ResNet
- run_vgg.sbatch - script to train vgg
evaluation/
- ResNet_eval.ipynb - ResNet evaluation on pokemon dataset.
- SLIP_segment_eval.ipynb SLIP - Evalution of SLIP after finetuning CLIP, on pokemon dataset.
- make_evalutaion_dataset.py Creates evaluation dataset.
- pokedex.csv Contains information mapping image index to image class.
- pretrained_eval_segment.ipynb SLIP - Evalution of SLIP using pretrained CLIP, on pokemon dataset.
finetuned CLIP/
- captions.csv - contains captions for CLIP finetuning.
- clip_grid_search.py - Runs grid search on CLIP for hyperparameter tuning.
- clip_grid_search_output - contains output after running gridsearch.
- convert_txt_to_csv.py - converts captions text file to a csv file.
- generate_captions.py - Generates captions for pokemon dataest.
- run.sbatch - script for running grid search.
plots/
- plot_resnet.ipynb - plots for resnet.
- plot_CLIP.ipynb - plots for CLIP.
- text_for_plot.txt - best CLIP model output during grid search.

How to run

Run the cells of the notebooks in SLIP demo/

Results

Model Architecture	Accuracy
SLIP - pretrained only	0.15
SLIP - finetuned	0.69

Sample output from SLIP

References

[1] Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A. C.; Lo, W.Y.; Doll ́ar, P.; and Girshick, R. 2023. Segment Anything. arXiv:2304.02643.

[2] Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; Krueger, G.; and Sutskever, I. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020.

[3] Contrastive Language-Image Pre-training

Authors

Arushi Arora: aa10350@nyu.edu
Saaketh Koundinya : sg7729@nyu.edu
Shreya Agarwal : sa6981@nyu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
SLIP demo		SLIP demo
assets		assets
baseline classifier		baseline classifier
docs		docs
evaluation		evaluation
finetuned CLIP		finetuned CLIP
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLIP demo

SLIP demo

assets

assets

baseline classifier

baseline classifier

docs

docs

evaluation

evaluation

finetuned CLIP

finetuned CLIP

plots

plots

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

SLIP

Goal

Our Proposed Architecture

Citation

Repository Structure

How to run

Results

Sample output from SLIP

References

Authors

About

Releases

Packages

Contributors 3

Languages

License

tommarvoloriddle/SLIP

Folders and files

Latest commit

History

Repository files navigation

SLIP

Goal

Our Proposed Architecture

Citation

Repository Structure

How to run

Results

Sample output from SLIP

References

Authors

About

Resources

License

Stars

Watchers

Forks

Languages