Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
-
Updated
Apr 10, 2024 - Python
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Image captioning model with Resnet50 encoder and LSTM decoder
Yet another im2txt (show and tell: A Neural Image Caption Generator)
Generate captions from images
Comparitive analysis of image captioning model using RNN, BiLSTM and Transformer model architectures on the Flickr8K dataset and InceptionV3 for image feature extraction.
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
Add a description, image, and links to the flickr8k-dataset topic page so that developers can more easily learn about it.
To associate your repository with the flickr8k-dataset topic, visit your repo's landing page and select "manage topics."