[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)

# Generating Image Captions With fastdup

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/caption_generation.ipynb)
[![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/caption_generation.ipynb)


This notebook shows how you can use [fastdup](https://github.com/visual-layer/fastdup) to generate image captions. Caption generation has many useful use cases, including zero-shot classification, and accessibility features.
Additional examples in this notebook include visual question answering, which can be used for a number of applications such as image retrieval.

The captioning models employed in this example can generally be run on a CPU, with no GPU needed. The smallest model in this example requires about 0.5s per image caption, allowing 100,000 images to be captioned in half a day.

## Install fastdup

First, install fastdup and verify the installation.

In [3]:
!pip install fastdup -Uq

Now, test the installation. If there's no error message, we are ready to go.

In [1]:
import fastdup
fastdup.__version__

'1.39'

## Load Dataset

In this example we will be using the [COCO Minitrain Dataset](https://github.com/giddyyupp/coco-minitrain), which is a curated mini training set of about 25,000 images (20% of the original COCO dataset).
We will download the dataset into our local drive.

In [None]:
!pip install gdown

In [14]:
# Download coco minitrain dataset.
!gdown --fuzzy https://drive.google.com/file/d/1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK/view
!unzip -qq coco_minitrain_25k.zip

# Download csv annotations
!cd coco_minitrain_25k/annotations && gdown --fuzzy https://drive.google.com/file/d/1i12p23cXlqp1QrXjAD_vu467r4q67Mq9/view

Downloading...
From (uriginal): https://drive.google.com/uc?id=1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK
From (redirected): https://drive.google.com/uc?id=1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK&confirm=t&uuid=904b1819-dad6-43e5-a49c-b41056177666
To: /Users/guysinger/Desktop/fastdup/examples/coco_minitrain_25k.zip
100%|██████████████████████████████████████| 4.90G/4.90G [09:02<00:00, 9.03MB/s]
Downloading...
From: https://drive.google.com/uc?id=1i12p23cXlqp1QrXjAD_vu467r4q67Mq9
To: /Users/guysinger/Desktop/fastdup/examples/coco_minitrain_25k/annotations/coco_minitrain2017.csv
100%|██████████████████████████████████████| 9.43M/9.43M [00:01<00:00, 6.29MB/s]


### Load Annotations
We will load the annotations into fastdup's annotations format, using a simple converter to translate the COCO annotations into an annotations dataframe (this conversion can be used for any dataset that uses the COCO annotations format).

In [15]:
import pandas as pd

coco_csv = 'coco_minitrain_25k/annotations/coco_minitrain2017.csv'
coco_annotations = pd.read_csv(coco_csv, 
                               header=None, 
                               names=['filename', 'col_x', 'row_y',
                                      'width', 'height', 'label', 'ext'])

coco_annotations['split'] = 'train'  # Only train files were loaded
coco_annotations['filename'] = coco_annotations['filename'].apply(lambda x: 'images/train2017/'+x)
coco_annotations = coco_annotations.drop_duplicates()

In [16]:
coco_annotations.head(10)

Unnamed: 0,filename,col_x,row_y,width,height,label,ext,split
0,images/train2017/000000131075.jpg,20.23,55.98,313.49,326.5,tv,0,train
1,images/train2017/000000131075.jpg,176.9,381.12,286.2,136.63,laptop,0,train
2,images/train2017/000000131075.jpg,369.96,361.35,72.76,73.91,laptop,0,train
3,images/train2017/000000131075.jpg,411.68,417.87,66.32,129.44,chair,0,train
4,images/train2017/000000131075.jpg,367.31,363.25,72.27,67.01,tv,0,train
5,images/train2017/000000393223.jpg,289.08,251.0,128.51,210.94,toothbrush,0,train
6,images/train2017/000000393223.jpg,2.16,1.98,566.29,478.02,person,0,train
7,images/train2017/000000393228.jpg,84.63,201.48,196.19,122.14,elephant,0,train
8,images/train2017/000000393228.jpg,309.93,266.51,106.39,56.62,elephant,0,train
9,images/train2017/000000393228.jpg,420.31,243.01,134.0,75.76,elephant,0,train


## Run fastdup

Run fastdup with annotations on the dataset. Here, we set `num_images` to limit the run to 100 images.

In [17]:
fd = fastdup.create(input_dir='./coco_minitrain_25k')
fd.run(annotations=coco_annotations, ccthreshold=0.9, num_images=100)



Downloading data files:   0%|          | 0/2 [24:35<?, ?it/s]


FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-09-13 12:28:24 [INFO] Going to loop over dir /var/folders/jx/kcrhkm990p3fnsy7bmv9nh500000gn/T/tmp5m96p81t.csv
2023-09-13 12:28:24 [INFO] Found total 100 images to run on, 100 train, 0 test, name list 100, counter 100 
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.utes
2023-09-13 12:28:26 [INFO] Going to loop over dir /var/folders/jx/kcrhkm990p3fnsy7bmv9nh500000gn/T/crops_input.csv
2023-09-13 12:28:26 [INFO] Found total 95 images to run on, 95 train, 0 test, name list 95, counter 95 
2023-09-13 12:28:27 [INFO] Found total 95 images to run onEstimated: 0 Minutes
2023-09-13 12:28:27 [INFO] 14) Finished write_index() NN model
2023-09-13 12:28:27 [INFO] Stored nn model index file work_dir/nnf.index
2023-09-13 12:28:27 [INFO] Total time took 1025 ms
2023-09-13 12:28:27 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 % of total graph edges
2023-09-13 12:28:27 

0

## Generate Captions

Available model for captioning are:
- ViT-GPT2 : `'vitgpt2'`
- BLIP-2 : `'blip2'`
- BLIP : `'blip'`

Available models for VQA are:
- Vilt-b32: `'indoors_outdoors'`
---> currently only available for indoor/outdoor VQA
- ViT-Age: `'age'`
---> currently only available for person age VQA

By default, the model used will be ViT-GPT2, if not specified otherwise.

In [19]:
captions_df = fd.caption(model_name='automatic')

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.
100%|██████████| 95/95 [00:41<00:00,  2.29it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['caption'] = generate_labels(df['filename'], modelname)


## Visualize Results

Use fastdup's built-in galleries methods to visualize the captioned images.
Additionally, captions can always be generated for a gallery by setting the `label_col` argument to one of the available model names listed above.

In [22]:
to_show = captions_df.sample(20)
visualization_df = pd.DataFrame({'from':to_show['filename'],'to':to_show['filename'], 'label':to_show['caption'], 'distance':0*len(to_show),})
fastdup.create_outliers_gallery(visualization_df, save_path='.', num_images=10)

100%|██████████| 10/10 [00:00<00:00, 24272.59it/s]

Stored outliers visual view in  ./outliers.html





0

In [23]:
HTML('outliers.html')

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000262187.jpg
label,a red and white plane on a runway

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000262162.jpg
label,a bedroom with a desk and a bed

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000524325.jpg
label,people flying kites on a beach

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000524340.jpg
label,a cat laying on top of a couch

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000131075.jpg
label,a computer screen on a desk with a monitor

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000131107.jpg
label,a traffic light with a street light above it

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000000025.jpg
label,a giraffe standing next to a tree in a field

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000131152.jpg
label,a man riding a surfboard on top of a wave

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000524375.jpg
label,a herd of cattle grazing on a lush green hillside

Info,Unnamed: 1
Distance,0
Path,coco_minitrain_25k/images/train2017/000000000061.jpg
label,people riding on the backs of elephants


# Wrap Up

Next, feel free to check out other tutorials -

+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!
+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.
+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!
+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. 


## VL Profiler
If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. 

[Sign up](https://app.visual-layer.com) now, it's free.

[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/vl_profiler_promo.svg)](https://app.visual-layer.com)

As usual, feedback is welcome! 

Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues).