Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
-
Updated
Apr 10, 2024 - Python
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Image captioning model with Resnet50 encoder and LSTM decoder
Visual Elocution Synthesis
PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k
Processing data produced by flickr30k_entities to use as regional description for densecap model
Add a description, image, and links to the flickr30k topic page so that developers can more easily learn about it.
To associate your repository with the flickr30k topic, visit your repo's landing page and select "manage topics."