<a href="https://colab.research.google.com/github/visualdatabase/fastdup/blob/main/examples/fastdup_wrong_labels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fastdup for finding wrong or confusing labels 
In this notebook we will learn how to identify wrong labels using image similarity. We first embed the images into short feature vectors and build a nearest neighbor model using fastdup. Next we use k-nearest neighbor classifier to score images that are more similar to other classes using fastdup `create_similarity_gallery()` method.

In [17]:
#install fastdup
!pip install -U --force-reinstall fastdup

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fastdup
  Downloading fastdup-0.123-cp37-cp37m-manylinux_2_27_x86_64.whl (38.8 MB)
[K     |████████████████████████████████| 38.8 MB 1.4 MB/s 
[?25hCollecting tqdm
  Downloading tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 7.8 MB/s 
[?25hCollecting pandas
  Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
[K     |████████████████████████████████| 11.3 MB 43.2 MB/s 
[?25hCollecting pyyaml
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 40.3 MB/s 
[?25hCollecting opencv-python
  Downloading opencv_python-4.6.0.66-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.9 MB)
[K     |████████████████████████████████| 60.9 MB 121 kB/s 
[?25hCollecting 

In [2]:
#download a subset of the food-101 dataset containing soup images
!gdown 1_iHtUfN8Q01y2PsOuTOclXELkY9gClEZ

Downloading...
From: https://drive.google.com/uc?id=1_iHtUfN8Q01y2PsOuTOclXELkY9gClEZ
To: /content/soups.zip
100% 135M/135M [00:01<00:00, 97.7MB/s]


In [3]:
!unzip -qq soups.zip

In [1]:
import fastdup

In [72]:
!rm -fr out
!python -c "import fastdup; fastdup.run('food-101', work_dir='out',nearest_neighbors_k=10,threshold=0.9)"

FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
On Jupyter notebook running on large datasets, there may be delay getting the console output. We recommend running using python shell.
Going to loop over dir food-101
Found total 3000 images to run on
Wrote total of 3000 features , found 0 bad images
Found total 3000 images to run on
1074) Finished write_index() NN model
Stored nn model index file out/nnf.index
[1;32m1659343769 : INFO:     (add_vertices:460): Num vertices for group 0: 3000
[0m[1;32m1659343769 : INFO:     (commit_edge_buffer:609): In commit edge buffer (0,0)
[0m[1;32m1659343769 : INFO:     (commit_edge_buffer:680): Shuffling edges ...
[0m[1;32m1659343769 : INFO:     (commit_edge_buffer:688): Done shuffling edges in 0.007684 secs
[0m[1;32m1659343769 : INFO:     (commit_edge_buffer:692): Aggregating unique vertices...
[0m[1;32m1659343769 : INFO:     (commit_edge_buffer:705): Done aggregating unique vertex in 0.001379 secs
[0m[1;32m1659

In [83]:
import cv2
import numpy as np
import pandas as pd
from fastdup.image import plot_bounding_box, my_resize, imageformatter
from fastdup.galleries import slice_df
from tqdm import tqdm
import traceback
import os

def do_create_similarity_gallery(similarity_file, save_path, num_images=20, lazy_load=False, get_label_func=None,
                                 slice=None, max_width=None, descending=False, get_bounding_box_func =None,
                                 get_reformat_filename_func=None, get_extra_col_func=None):
    '''

    Function to create and display a gallery of images computed by the outliers metrics

    Parameters:
        stats_file (str): csv file with the computed image statistics by the fastdup tool

        save_path (str): output folder location for the visuals

        num_images(int): Max number of images to display (default = 50). Be careful not to display too many images at once otherwise the notebook may go out of memory.

        lazy_load (boolean): If False, write all images inside html file using base64 encoding. Otherwise use lazy loading in the html to load images when mouse curser is above the image (reduced html file size).

        get_label_func (callable): Optional parameter to allow adding more image information to the report like the image label. This is a function the user implements that gets the full file path and returns html string with the label or any other metadata desired.

        metric (str): Optional metric selection. One of blur, size, mean_value

        slice (str or list): Optional parameter to select a slice of the outliers file based on a specific label or a list of labels.

        max_width (int): Optional param to limit the image width

        descending (bool): Optional param to control the order of the metric

        get_bounding_box_func (callable): Optional parameter to allow adding bounding box to the image. This is a function the user implements that gets the full file path and returns a bounding box or an empty list if not available.

        get_reformat_filename_func (callable): Optional parameter to allow reformatting the filename before displaying it in the report. This is a function the user implements that gets the full file path and returns a string with the reformatted filename.

        get_extra_col_func (callable): Optional parameter to allow adding more image information to the report like the image label. This is a function the user implements that gets the full file path and returns html string with the label or any other metadata desired.

     '''


    from fastdup import generate_sprite_image
    img_paths = []
    img_paths2 = []
    info0 = []
    info = []
    label_score = []
    lengths = []

    df = pd.read_csv(similarity_file)
    assert len(df), "Failed to read stats file " + similarity_file

    if callable(get_label_func):
        df['label'] = df['from'].apply(lambda x: get_label_func(x))
        df['label2'] = df['to'].apply(lambda x: get_label_func(x))
        if slice != 'label_score':
          df = slice_df(df, slice)

    df = df.sort_values(['from','distance'], ascending= not descending)
    if 'label' in df.columns:
        top_labels = df.groupby('from')['label2'].apply(list)

    tos = df.groupby('from')['to'].apply(list)
    distances = df.groupby('from')['distance'].apply(list)

    if 'label' in df.columns:
        subdf = pd.DataFrame({'to':tos, 'label':top_labels,'distance':distances}).reset_index()
    else:
        subdf = pd.DataFrame({'to':tos, 'distance':distances}).reset_index()

    if slice is None or slice != 'label_score':
        subdf = subdf.head(num_images)
    else:
        for i, row in tqdm(subdf.iterrows(), total=len(subdf)):
            filename = row['from']
            label = get_label_func(filename)
            similar = [x==label for x in list(row['label'])]
            similar = 100.0*sum(similar)/(1.0*len(row['label']))
            lengths.append(len(row['label']))
            label_score.append(similar)
        subdf['score'] = label_score
        subdf['length'] = lengths
        print(subdf['score'].describe())
        subdf = subdf[subdf['length'] > 1]
        subdf = subdf.sort_values(['score','length'], ascending=not descending)
        subdf = subdf.head(num_images)
          
    for i, row in tqdm(subdf.iterrows(), total=min(num_images, len(subdf))):
        try:
            filename = row['from']
            label = get_label_func(filename)
            if callable(get_reformat_filename_func):
                new_filename = get_reformat_filename_func(filename)
            else:
                new_filename = filename

            if 'label' in row:
                info0_df = pd.DataFrame({'label':[label],'from':[new_filename]}).T
            else:
                info0_df = pd.DataFrame({'from':[new_filename]}).T

            info0.append(info0_df.to_html(header=False,escape=False).replace('\n',''))


            img = cv2.imread(filename)
            img = plot_bounding_box(img, get_bounding_box_func, filename)
            img = my_resize(img, max_width)

            imgpath = os.path.join(save_path, filename.replace('/',''))
            p, ext = os.path.splitext(imgpath)
            if ext is not None and ext != '' and ext.lower() not in ['png','tiff','tif','jpeg','jpg','gif']:
                imgpath += ".jpg"

            cv2.imwrite(imgpath, img)
            assert os.path.exists(imgpath), "Failed to save img to " + imgpath

            MAX_IMAGES = 10
            imgs = row['to'][:MAX_IMAGES]
            distances = row['distance'][:MAX_IMAGES]
            imgpath2 = f"{save_path}/to_image_{i}.jpg"
            info_df = pd.DataFrame({'distance':distances, 'to':imgs})


            if callable(get_reformat_filename_func):
              info_df['to'] = info_df['to'].apply(lambda x: get_reformat_filename_func(x))

            if 'label2' in df.columns:
                info_df['label'] = row['label'][:MAX_IMAGES]
            info_df = info_df.sort_values('distance',ascending=False)
            info.append(info_df.to_html(escape=False).replace('\n',''))

            h = max_width if max_width is not None else 0
            w = h
            generate_sprite_image(imgs, min(len(imgs), MAX_IMAGES), save_path, get_label_func, h, w, imgpath2, min(len(imgs),MAX_IMAGES), max_width=max_width)
            assert os.path.exists(imgpath2)

        except Exception as ex:
            traceback.print_exc()
            print("Failed to generate viz for images", filename, ex)
            imgpath = None
            imgpath2 = None

        img_paths.append(imgpath)
        img_paths2.append(imgpath2)

    import fastdup.html_writer
    if not lazy_load:
        subdf.insert(0, 'Image', [imageformatter(x, max_width) for x in img_paths])
        subdf.insert(0, 'Similar', [imageformatter(x, None) for x in img_paths2])
    else:
        img_paths3 = ["<img src=\"" + os.path.join(save_path, os.path.basename(x)) + "\" loading=\"lazy\">" for x in img_paths]
        img_paths4 = ["<img src=\"" + os.path.join(save_path, os.path.basename(x)) + "\" loading=\"lazy\">" for x in img_paths2]
        subdf.insert(0, 'Image', img_paths3)
        subdf.insert(0, 'Similar', img_paths4)

    subdf['info_to'] = info
    subdf['info_from'] = info0

    out_file = os.path.join(save_path, 'topk_similarity.html')
    title = 'Fastdup Tool - Similarity Image Report'
    if slice is not None:
        title += ", " + str(slice)

    cols = ['info_from','info_to', 'Image','Similar']
    if slice is not None and slice == 'label_score':
      cols = ['score'] + cols
    if callable(get_extra_col_func):
        subdf['extra'] = subdf['from'].apply(lambda x: get_extra_col_func(x))
        cols.append('extra')


    fastdup.html_writer.write_to_html_file(subdf[cols], title, out_file)
    assert os.path.exists(out_file), "Failed to generate out file " + out_file

    print("Stored similar images view in ", os.path.join(out_file))
    if not lazy_load:
        for i in img_paths:
            try:
                os.unlink(i)
            except Exception as ex:
                print("Failed to delete image file ", i, ex)

In [84]:
#return the folder name which is the label of the soup class
def my_label_func(x):
  return x.split('/')[-2]

In [85]:
do_create_similarity_gallery('out/similarity.csv','.',get_label_func=my_label_func,
                             get_reformat_filename_func=lambda x: os.path.basename(x),max_width=180,slice='label_score',descending=True)

100%|██████████| 430/430 [00:00<00:00, 7677.33it/s]


count    430.000000
mean      95.288575
std       17.943400
min        0.000000
25%      100.000000
50%      100.000000
75%      100.000000
max      100.000000
Name: score, dtype: float64


100%|██████████| 20/20 [00:02<00:00,  8.30it/s]


Stored similar images view in  ./topk_similarity.html


## First we look at the best images, images that are very similar in their class

In [86]:
from IPython.display import HTML
HTML('./topk_similarity.html')

Unnamed: 0_level_0,score,info_from,info_to,Image,Similar
Unnamed: 0_level_1,distance,to,label,Unnamed: 4_level_1,Unnamed: 5_level_1
Unnamed: 0_level_2,distance,to,label,Unnamed: 4_level_2,Unnamed: 5_level_2
Unnamed: 0_level_3,distance,to,label,Unnamed: 4_level_3,Unnamed: 5_level_3
Unnamed: 0_level_4,distance,to,label,Unnamed: 4_level_4,Unnamed: 5_level_4
Unnamed: 0_level_5,distance,to,label,Unnamed: 4_level_5,Unnamed: 5_level_5
Unnamed: 0_level_6,distance,to,label,Unnamed: 4_level_6,Unnamed: 5_level_6
Unnamed: 0_level_7,distance,to,label,Unnamed: 4_level_7,Unnamed: 5_level_7
Unnamed: 0_level_8,distance,to,label,Unnamed: 4_level_8,Unnamed: 5_level_8
Unnamed: 0_level_9,distance,to,label,Unnamed: 4_level_9,Unnamed: 5_level_9
Unnamed: 0_level_10,distance,to,label,Unnamed: 4_level_10,Unnamed: 5_level_10
Unnamed: 0_level_11,distance,to,label,Unnamed: 4_level_11,Unnamed: 5_level_11
Unnamed: 0_level_12,distance,to,label,Unnamed: 4_level_12,Unnamed: 5_level_12
Unnamed: 0_level_13,distance,to,label,Unnamed: 4_level_13,Unnamed: 5_level_13
Unnamed: 0_level_14,distance,to,label,Unnamed: 4_level_14,Unnamed: 5_level_14
Unnamed: 0_level_15,distance,to,label,Unnamed: 4_level_15,Unnamed: 5_level_15
Unnamed: 0_level_16,distance,to,label,Unnamed: 4_level_16,Unnamed: 5_level_16
Unnamed: 0_level_17,distance,to,label,Unnamed: 4_level_17,Unnamed: 5_level_17
Unnamed: 0_level_18,distance,to,label,Unnamed: 4_level_18,Unnamed: 5_level_18
Unnamed: 0_level_19,distance,to,label,Unnamed: 4_level_19,Unnamed: 5_level_19
Unnamed: 0_level_20,distance,to,label,Unnamed: 4_level_20,Unnamed: 5_level_20
70,100.0,label hot_and_sour_soup from 1151861.jpg,distance to label 0 0.921008 478316.jpg hot_and_sour_soup 1 0.918352 3568665.jpg hot_and_sour_soup 2 0.913725 204679.jpg hot_and_sour_soup 3 0.913571 3567487.jpg hot_and_sour_soup 4 0.912893 854589.jpg hot_and_sour_soup 5 0.912827 2377494.jpg hot_and_sour_soup 6 0.912424 387487.jpg hot_and_sour_soup 7 0.909474 3706507.jpg hot_and_sour_soup 8 0.908270 384751.jpg hot_and_sour_soup 9 0.905464 611992.jpg hot_and_sour_soup,,
label,hot_and_sour_soup,,,,
from,1151861.jpg,,,,
,distance,to,label,,
0,0.921008,478316.jpg,hot_and_sour_soup,,
1,0.918352,3568665.jpg,hot_and_sour_soup,,
2,0.913725,204679.jpg,hot_and_sour_soup,,
3,0.913571,3567487.jpg,hot_and_sour_soup,,
4,0.912893,854589.jpg,hot_and_sour_soup,,
5,0.912827,2377494.jpg,hot_and_sour_soup,,

0,1
label,hot_and_sour_soup
from,1151861.jpg

Unnamed: 0,distance,to,label
0,0.921008,478316.jpg,hot_and_sour_soup
1,0.918352,3568665.jpg,hot_and_sour_soup
2,0.913725,204679.jpg,hot_and_sour_soup
3,0.913571,3567487.jpg,hot_and_sour_soup
4,0.912893,854589.jpg,hot_and_sour_soup
5,0.912827,2377494.jpg,hot_and_sour_soup
6,0.912424,387487.jpg,hot_and_sour_soup
7,0.909474,3706507.jpg,hot_and_sour_soup
8,0.90827,384751.jpg,hot_and_sour_soup
9,0.905464,611992.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1240584.jpg

Unnamed: 0,distance,to,label
0,0.915289,2524577.jpg,hot_and_sour_soup
1,0.910561,3879471.jpg,hot_and_sour_soup
2,0.909914,387487.jpg,hot_and_sour_soup
3,0.907982,154363.jpg,hot_and_sour_soup
4,0.907455,1653899.jpg,hot_and_sour_soup
5,0.906228,217569.jpg,hot_and_sour_soup
6,0.905829,3275927.jpg,hot_and_sour_soup
7,0.90565,2382833.jpg,hot_and_sour_soup
8,0.905075,3706507.jpg,hot_and_sour_soup
9,0.904391,2323446.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1363828.jpg

Unnamed: 0,distance,to,label
0,0.919803,3320353.jpg,hot_and_sour_soup
1,0.916993,2688464.jpg,hot_and_sour_soup
2,0.913018,2645587.jpg,hot_and_sour_soup
3,0.909605,943151.jpg,hot_and_sour_soup
4,0.909493,3265742.jpg,hot_and_sour_soup
5,0.909156,1577046.jpg,hot_and_sour_soup
6,0.9078,3244591.jpg,hot_and_sour_soup
7,0.906775,2524133.jpg,hot_and_sour_soup
8,0.906423,204679.jpg,hot_and_sour_soup
9,0.905364,3812573.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1502582.jpg

Unnamed: 0,distance,to,label
0,0.930921,661084.jpg,hot_and_sour_soup
1,0.928152,1577046.jpg,hot_and_sour_soup
2,0.926727,2688464.jpg,hot_and_sour_soup
3,0.92428,3673317.jpg,hot_and_sour_soup
4,0.918844,3004353.jpg,hot_and_sour_soup
5,0.91749,1347323.jpg,hot_and_sour_soup
6,0.916217,2790483.jpg,hot_and_sour_soup
7,0.914753,1449680.jpg,hot_and_sour_soup
8,0.914399,2368872.jpg,hot_and_sour_soup
9,0.910941,1316758.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1548700.jpg

Unnamed: 0,distance,to,label
0,0.927746,3496274.jpg,hot_and_sour_soup
1,0.915495,1126455.jpg,hot_and_sour_soup
2,0.914131,3601021.jpg,hot_and_sour_soup
3,0.913585,1936292.jpg,hot_and_sour_soup
4,0.909936,3918910.jpg,hot_and_sour_soup
5,0.907379,2259445.jpg,hot_and_sour_soup
6,0.904762,947500.jpg,hot_and_sour_soup
7,0.904525,113560.jpg,hot_and_sour_soup
8,0.904397,222193.jpg,hot_and_sour_soup
9,0.9039,1942754.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1577046.jpg

Unnamed: 0,distance,to,label
0,0.930355,3668485.jpg,hot_and_sour_soup
1,0.929584,2645587.jpg,hot_and_sour_soup
2,0.928152,1502582.jpg,hot_and_sour_soup
3,0.918787,2786483.jpg,hot_and_sour_soup
4,0.918694,3320353.jpg,hot_and_sour_soup
5,0.918674,2688464.jpg,hot_and_sour_soup
6,0.918008,1347323.jpg,hot_and_sour_soup
7,0.916206,3891611.jpg,hot_and_sour_soup
8,0.915349,3854857.jpg,hot_and_sour_soup
9,0.914477,2936421.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1805255.jpg

Unnamed: 0,distance,to,label
0,0.931257,3706507.jpg,hot_and_sour_soup
1,0.919753,2524577.jpg,hot_and_sour_soup
2,0.918175,3601021.jpg,hot_and_sour_soup
3,0.91657,2531145.jpg,hot_and_sour_soup
4,0.914502,1055929.jpg,hot_and_sour_soup
5,0.914248,1519845.jpg,hot_and_sour_soup
6,0.913849,2382833.jpg,hot_and_sour_soup
7,0.911644,209424.jpg,hot_and_sour_soup
8,0.909448,3567487.jpg,hot_and_sour_soup
9,0.907811,579590.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,204679.jpg

Unnamed: 0,distance,to,label
0,0.917905,854589.jpg,hot_and_sour_soup
1,0.91571,3891611.jpg,hot_and_sour_soup
2,0.915284,993036.jpg,hot_and_sour_soup
3,0.914944,3854857.jpg,hot_and_sour_soup
4,0.913725,1151861.jpg,hot_and_sour_soup
5,0.91364,579590.jpg,hot_and_sour_soup
6,0.913213,1168184.jpg,hot_and_sour_soup
7,0.911907,2323446.jpg,hot_and_sour_soup
8,0.91022,3706507.jpg,hot_and_sour_soup
9,0.90662,384751.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2256057.jpg

Unnamed: 0,distance,to,label
0,0.925518,2943259.jpg,hot_and_sour_soup
1,0.915164,2367229.jpg,hot_and_sour_soup
2,0.914575,3658920.jpg,hot_and_sour_soup
3,0.907244,497575.jpg,hot_and_sour_soup
4,0.905361,2344945.jpg,hot_and_sour_soup
5,0.904224,3706507.jpg,hot_and_sour_soup
6,0.903312,2531145.jpg,hot_and_sour_soup
7,0.90295,3286625.jpg,hot_and_sour_soup
8,0.902081,1805255.jpg,hot_and_sour_soup
9,0.901932,2273383.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2323446.jpg

Unnamed: 0,distance,to,label
0,0.931249,3854857.jpg,hot_and_sour_soup
1,0.918799,1168184.jpg,hot_and_sour_soup
2,0.917521,854589.jpg,hot_and_sour_soup
3,0.913954,3891611.jpg,hot_and_sour_soup
4,0.911907,204679.jpg,hot_and_sour_soup
5,0.90616,1577046.jpg,hot_and_sour_soup
6,0.904391,1240584.jpg,hot_and_sour_soup
7,0.904356,3376428.jpg,hot_and_sour_soup
8,0.904,3812573.jpg,hot_and_sour_soup
9,0.903447,3644289.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2344945.jpg

Unnamed: 0,distance,to,label
0,0.931675,2943259.jpg,hot_and_sour_soup
1,0.921287,3706507.jpg,hot_and_sour_soup
2,0.918073,3879326.jpg,hot_and_sour_soup
3,0.915152,2367229.jpg,hot_and_sour_soup
4,0.90538,1400511.jpg,hot_and_sour_soup
5,0.905361,2256057.jpg,hot_and_sour_soup
6,0.904321,3113531.jpg,hot_and_sour_soup
7,0.903756,1805255.jpg,hot_and_sour_soup
8,0.90249,2382833.jpg,hot_and_sour_soup
9,0.901968,1364912.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2377494.jpg

Unnamed: 0,distance,to,label
0,0.952566,3567487.jpg,hot_and_sour_soup
1,0.931618,3286625.jpg,hot_and_sour_soup
2,0.925734,3601021.jpg,hot_and_sour_soup
3,0.924538,3706507.jpg,hot_and_sour_soup
4,0.921522,209424.jpg,hot_and_sour_soup
5,0.919955,886216.jpg,hot_and_sour_soup
6,0.913699,993036.jpg,hot_and_sour_soup
7,0.912827,1151861.jpg,hot_and_sour_soup
8,0.911275,1000486.jpg,hot_and_sour_soup
9,0.910885,2524577.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2382833.jpg

Unnamed: 0,distance,to,label
0,0.926687,3706507.jpg,hot_and_sour_soup
1,0.926523,2524577.jpg,hot_and_sour_soup
2,0.924646,3216709.jpg,hot_and_sour_soup
3,0.915318,2531145.jpg,hot_and_sour_soup
4,0.913849,1805255.jpg,hot_and_sour_soup
5,0.913429,3601021.jpg,hot_and_sour_soup
6,0.909325,3567487.jpg,hot_and_sour_soup
7,0.90716,387487.jpg,hot_and_sour_soup
8,0.90565,1240584.jpg,hot_and_sour_soup
9,0.90369,2312653.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2524577.jpg

Unnamed: 0,distance,to,label
0,0.934362,3706507.jpg,hot_and_sour_soup
1,0.929131,3567487.jpg,hot_and_sour_soup
2,0.926523,2382833.jpg,hot_and_sour_soup
3,0.926425,3601021.jpg,hot_and_sour_soup
4,0.924212,387487.jpg,hot_and_sour_soup
5,0.919753,1805255.jpg,hot_and_sour_soup
6,0.918285,2312653.jpg,hot_and_sour_soup
7,0.916617,3155003.jpg,hot_and_sour_soup
8,0.915289,1240584.jpg,hot_and_sour_soup
9,0.910996,3879471.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2645587.jpg

Unnamed: 0,distance,to,label
0,0.932567,2943708.jpg,hot_and_sour_soup
1,0.929584,1577046.jpg,hot_and_sour_soup
2,0.924372,3244591.jpg,hot_and_sour_soup
3,0.917253,2688464.jpg,hot_and_sour_soup
4,0.913018,1363828.jpg,hot_and_sour_soup
5,0.910464,3673317.jpg,hot_and_sour_soup
6,0.907911,1502582.jpg,hot_and_sour_soup
7,0.90772,943151.jpg,hot_and_sour_soup
8,0.9076,2936421.jpg,hot_and_sour_soup
9,0.906782,1347323.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2688464.jpg

Unnamed: 0,distance,to,label
0,0.926727,1502582.jpg,hot_and_sour_soup
1,0.922781,3673317.jpg,hot_and_sour_soup
2,0.919307,943151.jpg,hot_and_sour_soup
3,0.918674,1577046.jpg,hot_and_sour_soup
4,0.918511,2786483.jpg,hot_and_sour_soup
5,0.918005,2790483.jpg,hot_and_sour_soup
6,0.917253,2645587.jpg,hot_and_sour_soup
7,0.916993,1363828.jpg,hot_and_sour_soup
8,0.9117,2450297.jpg,hot_and_sour_soup
9,0.910877,661084.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2922592.jpg

Unnamed: 0,distance,to,label
0,0.94289,3257336.jpg,hot_and_sour_soup
1,0.934865,2259445.jpg,hot_and_sour_soup
2,0.932036,3918910.jpg,hot_and_sour_soup
3,0.929941,3496274.jpg,hot_and_sour_soup
4,0.918179,3567487.jpg,hot_and_sour_soup
5,0.916446,1617113.jpg,hot_and_sour_soup
6,0.908622,2979433.jpg,hot_and_sour_soup
7,0.906401,2955708.jpg,hot_and_sour_soup
8,0.90523,2367229.jpg,hot_and_sour_soup
9,0.902332,1548700.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2936421.jpg

Unnamed: 0,distance,to,label
0,0.929444,3244591.jpg,hot_and_sour_soup
1,0.923474,3446030.jpg,hot_and_sour_soup
2,0.918943,2790483.jpg,hot_and_sour_soup
3,0.915255,3265742.jpg,hot_and_sour_soup
4,0.914477,1577046.jpg,hot_and_sour_soup
5,0.911077,1316758.jpg,hot_and_sour_soup
6,0.907736,2943708.jpg,hot_and_sour_soup
7,0.9076,2645587.jpg,hot_and_sour_soup
8,0.905812,2688464.jpg,hot_and_sour_soup
9,0.902998,1069789.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,3113531.jpg

Unnamed: 0,distance,to,label
0,0.919063,2367229.jpg,hot_and_sour_soup
1,0.918778,3552976.jpg,hot_and_sour_soup
2,0.910801,2377494.jpg,hot_and_sour_soup
3,0.90882,1633728.jpg,hot_and_sour_soup
4,0.907462,579590.jpg,hot_and_sour_soup
5,0.906733,3567487.jpg,hot_and_sour_soup
6,0.904321,2344945.jpg,hot_and_sour_soup
7,0.902579,3703194.jpg,hot_and_sour_soup
8,0.901377,3204149.jpg,hot_and_sour_soup
9,0.901174,2278783.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,3286625.jpg

Unnamed: 0,distance,to,label
0,0.931618,2377494.jpg,hot_and_sour_soup
1,0.930623,1167380.jpg,hot_and_sour_soup
2,0.930502,1955928.jpg,hot_and_sour_soup
3,0.924984,3567487.jpg,hot_and_sour_soup
4,0.924958,886216.jpg,hot_and_sour_soup
5,0.921308,3819269.jpg,hot_and_sour_soup
6,0.920385,3706507.jpg,hot_and_sour_soup
7,0.919707,209424.jpg,hot_and_sour_soup
8,0.917541,942430.jpg,hot_and_sour_soup
9,0.910784,1959025.jpg,hot_and_sour_soup


## We we examine confusing images that are not similar in their class

In [87]:
do_create_similarity_gallery('out/similarity.csv','.',get_label_func=my_label_func,
                             get_reformat_filename_func=lambda x: os.path.basename(x),max_width=180,slice='label_score',descending=False)

100%|██████████| 430/430 [00:00<00:00, 8131.87it/s]


count    430.000000
mean      95.288575
std       17.943400
min        0.000000
25%      100.000000
50%      100.000000
75%      100.000000
max      100.000000
Name: score, dtype: float64


100%|██████████| 20/20 [00:00<00:00, 23.98it/s]


Stored similar images view in  ./topk_similarity.html


In [88]:
HTML('./topk_similarity.html')

Unnamed: 0_level_0,score,info_from,info_to,Image,Similar
Unnamed: 0_level_1,distance,to,label,Unnamed: 4_level_1,Unnamed: 5_level_1
Unnamed: 0_level_2,distance,to,label,Unnamed: 4_level_2,Unnamed: 5_level_2
Unnamed: 0_level_3,distance,to,label,Unnamed: 4_level_3,Unnamed: 5_level_3
Unnamed: 0_level_4,distance,to,label,Unnamed: 4_level_4,Unnamed: 5_level_4
Unnamed: 0_level_5,distance,to,label,Unnamed: 4_level_5,Unnamed: 5_level_5
Unnamed: 0_level_6,distance,to,label,Unnamed: 4_level_6,Unnamed: 5_level_6
Unnamed: 0_level_7,distance,to,label,Unnamed: 4_level_7,Unnamed: 5_level_7
Unnamed: 0_level_8,distance,to,label,Unnamed: 4_level_8,Unnamed: 5_level_8
Unnamed: 0_level_9,distance,to,label,Unnamed: 4_level_9,Unnamed: 5_level_9
Unnamed: 0_level_10,distance,to,label,Unnamed: 4_level_10,Unnamed: 5_level_10
Unnamed: 0_level_11,distance,to,label,Unnamed: 4_level_11,Unnamed: 5_level_11
Unnamed: 0_level_12,distance,to,label,Unnamed: 4_level_12,Unnamed: 5_level_12
Unnamed: 0_level_13,distance,to,label,Unnamed: 4_level_13,Unnamed: 5_level_13
Unnamed: 0_level_14,distance,to,label,Unnamed: 4_level_14,Unnamed: 5_level_14
Unnamed: 0_level_15,distance,to,label,Unnamed: 4_level_15,Unnamed: 5_level_15
Unnamed: 0_level_16,distance,to,label,Unnamed: 4_level_16,Unnamed: 5_level_16
Unnamed: 0_level_17,distance,to,label,Unnamed: 4_level_17,Unnamed: 5_level_17
Unnamed: 0_level_18,distance,to,label,Unnamed: 4_level_18,Unnamed: 5_level_18
Unnamed: 0_level_19,distance,to,label,Unnamed: 4_level_19,Unnamed: 5_level_19
Unnamed: 0_level_20,distance,to,label,Unnamed: 4_level_20,Unnamed: 5_level_20
27,0.000000,label french_onion_soup from 2796610.jpg,distance to label 1 0.905198 1653899.jpg hot_and_sour_soup 0 0.902555 2563123.jpg hot_and_sour_soup,,
label,french_onion_soup,,,,
from,2796610.jpg,,,,
,distance,to,label,,
1,0.905198,1653899.jpg,hot_and_sour_soup,,
0,0.902555,2563123.jpg,hot_and_sour_soup,,
54,0.000000,label french_onion_soup from 801366.jpg,distance to label 1 0.919133 2941351.jpg hot_and_sour_soup 0 0.902568 1158432.jpg hot_and_sour_soup,,
label,french_onion_soup,,,,
from,801366.jpg,,,,
,distance,to,label,,

0,1
label,french_onion_soup
from,2796610.jpg

Unnamed: 0,distance,to,label
1,0.905198,1653899.jpg,hot_and_sour_soup
0,0.902555,2563123.jpg,hot_and_sour_soup

0,1
label,french_onion_soup
from,801366.jpg

Unnamed: 0,distance,to,label
1,0.919133,2941351.jpg,hot_and_sour_soup
0,0.902568,1158432.jpg,hot_and_sour_soup

0,1
label,french_onion_soup
from,2455859.jpg

Unnamed: 0,distance,to,label
2,0.911707,2531145.jpg,hot_and_sour_soup
1,0.906494,2943259.jpg,hot_and_sour_soup
0,0.901001,2382833.jpg,hot_and_sour_soup

0,1
label,french_onion_soup
from,2962330.jpg

Unnamed: 0,distance,to,label
3,0.921225,1316758.jpg,hot_and_sour_soup
2,0.911245,3244591.jpg,hot_and_sour_soup
1,0.911134,1347323.jpg,hot_and_sour_soup
0,0.902943,2936421.jpg,hot_and_sour_soup

0,1
label,french_onion_soup
from,2582633.jpg

Unnamed: 0,distance,to,label
7,0.918497,3668485.jpg,hot_and_sour_soup
6,0.91596,2998002.jpg,hot_and_sour_soup
5,0.912746,2786483.jpg,hot_and_sour_soup
4,0.905066,1577046.jpg,hot_and_sour_soup
3,0.904618,1502582.jpg,hot_and_sour_soup
2,0.904602,154363.jpg,hot_and_sour_soup
1,0.902539,3673317.jpg,hot_and_sour_soup
0,0.901111,2172589.jpg,hot_and_sour_soup

0,1
label,french_onion_soup
from,2482062.jpg

Unnamed: 0,distance,to,label
9,0.936405,1406237.jpg,hot_and_sour_soup
8,0.931915,3567487.jpg,hot_and_sour_soup
7,0.918301,2367229.jpg,hot_and_sour_soup
6,0.917525,3706507.jpg,hot_and_sour_soup
5,0.917102,1400511.jpg,hot_and_sour_soup
4,0.909024,3118313.jpg,hot_and_sour_soup
3,0.908065,2377494.jpg,hot_and_sour_soup
2,0.905536,854589.jpg,hot_and_sour_soup
1,0.90351,1617113.jpg,hot_and_sour_soup
0,0.902777,148669.jpg,hot_and_sour_soup

0,1
label,french_onion_soup
from,2623210.jpg

Unnamed: 0,distance,to,label
9,0.925362,154363.jpg,hot_and_sour_soup
8,0.920308,1617113.jpg,hot_and_sour_soup
7,0.919012,3567487.jpg,hot_and_sour_soup
6,0.909141,2524577.jpg,hot_and_sour_soup
5,0.907853,3891611.jpg,hot_and_sour_soup
4,0.906728,442387.jpg,hot_and_sour_soup
3,0.905475,3879471.jpg,hot_and_sour_soup
2,0.905036,539296.jpg,french_onion_soup
1,0.904863,854589.jpg,hot_and_sour_soup
0,0.904348,1240584.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1158432.jpg

Unnamed: 0,distance,to,label
1,0.912264,2941351.jpg,hot_and_sour_soup
0,0.902568,801366.jpg,french_onion_soup

0,1
label,hot_and_sour_soup
from,200587.jpg

Unnamed: 0,distance,to,label
1,0.915139,3118313.jpg,hot_and_sour_soup
0,0.902053,2482062.jpg,french_onion_soup

0,1
label,french_onion_soup
from,539296.jpg

Unnamed: 0,distance,to,label
5,0.924186,883844.jpg,french_onion_soup
4,0.911474,2184467.jpg,hot_and_sour_soup
3,0.910285,154363.jpg,hot_and_sour_soup
2,0.907492,3004353.jpg,hot_and_sour_soup
1,0.905036,2623210.jpg,french_onion_soup
0,0.900526,759629.jpg,french_onion_soup

0,1
label,hot_and_sour_soup
from,382979.jpg

Unnamed: 0,distance,to,label
2,0.907557,154363.jpg,hot_and_sour_soup
1,0.901541,2005531.jpg,hot_and_sour_soup
0,0.900588,1020156.jpg,french_onion_soup

0,1
label,hot_and_sour_soup
from,442387.jpg

Unnamed: 0,distance,to,label
2,0.906728,2623210.jpg,french_onion_soup
1,0.901407,154363.jpg,hot_and_sour_soup
0,0.900856,854589.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,497575.jpg

Unnamed: 0,distance,to,label
2,0.907244,2256057.jpg,hot_and_sour_soup
1,0.901151,2482062.jpg,french_onion_soup
0,0.900538,2367229.jpg,hot_and_sour_soup

0,1
label,miso_soup
from,1096698.jpg

Unnamed: 0,distance,to,label
2,0.910387,451876.jpg,miso_soup
1,0.90893,2786483.jpg,hot_and_sour_soup
0,0.907912,343304.jpg,miso_soup

0,1
label,hot_and_sour_soup
from,154363.jpg

Unnamed: 0,distance,to,label
9,0.925362,2623210.jpg,french_onion_soup
8,0.920773,3879471.jpg,hot_and_sour_soup
7,0.91906,387487.jpg,hot_and_sour_soup
6,0.910854,1670529.jpg,hot_and_sour_soup
5,0.910285,539296.jpg,french_onion_soup
4,0.909794,2516789.jpg,hot_and_sour_soup
3,0.907982,1240584.jpg,hot_and_sour_soup
2,0.907557,382979.jpg,hot_and_sour_soup
1,0.904602,2582633.jpg,french_onion_soup
0,0.90335,1617113.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,148669.jpg

Unnamed: 0,distance,to,label
3,0.902777,2482062.jpg,french_onion_soup
2,0.901723,2495594.jpg,hot_and_sour_soup
1,0.900802,3452669.jpg,hot_and_sour_soup
0,0.90009,564763.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1942754.jpg

Unnamed: 0,distance,to,label
3,0.910951,3613554.jpg,hot_and_sour_soup
2,0.9039,1548700.jpg,hot_and_sour_soup
1,0.903487,3428336.jpg,hot_and_sour_soup
0,0.90348,2623210.jpg,french_onion_soup

0,1
label,hot_and_sour_soup
from,2955708.jpg

Unnamed: 0,distance,to,label
7,0.911116,3918910.jpg,hot_and_sour_soup
6,0.908651,3257336.jpg,hot_and_sour_soup
5,0.907947,3567487.jpg,hot_and_sour_soup
4,0.906401,2922592.jpg,hot_and_sour_soup
3,0.903105,2259445.jpg,hot_and_sour_soup
2,0.902435,1903256.jpg,miso_soup
1,0.900988,2482062.jpg,french_onion_soup
0,0.900582,3496274.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,1226605.jpg

Unnamed: 0,distance,to,label
4,0.91572,3004353.jpg,hot_and_sour_soup
3,0.909361,2172589.jpg,hot_and_sour_soup
2,0.90414,3454724.jpg,miso_soup
1,0.903955,2786483.jpg,hot_and_sour_soup
0,0.900279,661084.jpg,hot_and_sour_soup

0,1
label,hot_and_sour_soup
from,2172589.jpg

Unnamed: 0,distance,to,label
4,0.909361,1226605.jpg,hot_and_sour_soup
3,0.909229,2786483.jpg,hot_and_sour_soup
2,0.903488,3565393.jpg,hot_and_sour_soup
1,0.902125,3879471.jpg,hot_and_sour_soup
0,0.901111,2582633.jpg,french_onion_soup
