So far, we have ingested all input data into an OpenSearch index. In this Notebook, we will perform multiple searches and collect the results to calculate accuracy metrics such as **precision** and **recall**.

In [1]:
import ast
import json
import requests
import pandas as pd

### Read the source dataset

In [2]:
df = pd.read_csv('../data/vgenome_sample_1k.csv', index_col=0)
df = df.rename(columns={'tags': 'tags_list'})
df['tags_list'] = df['tags_list'].apply(ast.literal_eval)
df['tags'] = df['tags_list'].apply(lambda x: " ".join(x))

df.head()

Unnamed: 0,image_id,image_url,tags_list,tags
53559,2366867,http://crowdfile.blob.core.chinacloudapi.cn/46...,"[birds, camera, bird, sky, neck, peach sky, me...",birds camera bird sky neck peach sky metal pol...
41981,2378980,http://crowdfile.blob.core.chinacloudapi.cn/46...,"[window, line, double door, front windshield, ...",window line double door front windshield back ...
89244,2329504,http://crowdfile.blob.core.chinacloudapi.cn/46...,"[window, head, man, flag, pole, light, tie, bu...",window head man flag pole light tie bus car sk...
71785,2347788,http://crowdfile.blob.core.chinacloudapi.cn/46...,"[headband, kid, child, whisk, visor, container...",headband kid child whisk visor container chips...
100014,2318253,http://crowdfile.blob.core.chinacloudapi.cn/46...,"[bear, eye, claws, face, grass, mouth, these, ...",bear eye claws face grass mouth these stone th...


### Prepare the judgement data

We will select ten most common tags that we can use to perform searches.

In [4]:
# get the list of all tags
all_tags = []

for tag in df['tags_list'].values:
    all_tags += tag

len(all_tags)

17995

In [5]:
from collections import Counter

# count the number of occurrences (images) for each tag
occurrences = pd.Series(all_tags).value_counts()
occurrences[:10]

man         271
ground      226
sky         214
shirt       210
wall        208
tree        194
building    190
window      178
head        169
grass       157
Name: count, dtype: int64

In [6]:
select_tags = list(occurrences[:10].index.values)
select_tags

['man',
 'ground',
 'sky',
 'shirt',
 'wall',
 'tree',
 'building',
 'window',
 'head',
 'grass']

In addition, let's select five tags at random. This will help with negative testing. It's important to include rare tags in the evaluation set to get a good understanding of the accuracy.

In [7]:
import random

random_tags = random.sample(all_tags, 5)
random_tags

['pole', 'collar', 'side', 'street light', 'woman']

In [8]:
# append the five random tags to the original list
select_tags = list(set(select_tags + random_tags))
select_tags

['wall',
 'street light',
 'pole',
 'shirt',
 'building',
 'grass',
 'sky',
 'head',
 'ground',
 'collar',
 'window',
 'woman',
 'man',
 'side',
 'tree']

In [9]:
len(select_tags)

15

We will use these fifteen tags to perform searches, collect results, and create the evaluation set.

#### Add the ground truth

Let's include the list of images in which each of the `select_tag` is found. This is our **ground truth** for these fifteen tags.

In [11]:
judgement_data = {}

for search_tag in select_tags:
    matched_images = []
    # for each select tag, iterate thru the dataset
    for i, row in df.iterrows():
        # for each row (image) in the dataset, add the image if that tag was found in that image
        for available_tag in set(row['tags_list']):
            if available_tag == search_tag:
                matched_images.append(row['image_id'])
    judgement_data[search_tag] = matched_images

In [12]:
# convert the dictionary into a dataframe
df_judgement = pd.DataFrame(list(judgement_data.items()),
                            columns=['search_tag', 'image_ids'])
df_judgement

Unnamed: 0,search_tag,image_ids
0,wall,"[285966, 832, 512, 2399605, 2377478, 2390176, ..."
1,street light,"[832, 2377478, 2357116, 2369703, 285812, 23340..."
2,pole,"[2329504, 2405367, 2318141, 2323433, 2403833, ..."
3,shirt,"[2329504, 2347788, 2325505, 2417733, 2338858, ..."
4,building,"[2329504, 150439, 2378215, 2377478, 2323433, 2..."
5,grass,"[2318253, 2325505, 2408241, 2370306, 2348450, ..."
6,sky,"[2366867, 2329504, 2417733, 2398437, 832, 2405..."
7,head,"[2329504, 2325505, 2417733, 2398437, 2318141, ..."
8,ground,"[2347788, 2318253, 2333971, 2318141, 2370306, ..."
9,collar,"[2338858, 2407115, 2330169, 2356769, 2360082, ..."


Let's also count the number of images in which each tag is found.

In [13]:
df_judgement['matched_image_ct'] = df_judgement['image_ids'].apply(lambda x: len(x))

df_judgement

Unnamed: 0,search_tag,image_ids,matched_image_ct
0,wall,"[285966, 832, 512, 2399605, 2377478, 2390176, ...",208
1,street light,"[832, 2377478, 2357116, 2369703, 285812, 23340...",13
2,pole,"[2329504, 2405367, 2318141, 2323433, 2403833, ...",128
3,shirt,"[2329504, 2347788, 2325505, 2417733, 2338858, ...",210
4,building,"[2329504, 150439, 2378215, 2377478, 2323433, 2...",190
5,grass,"[2318253, 2325505, 2408241, 2370306, 2348450, ...",157
6,sky,"[2366867, 2329504, 2417733, 2398437, 832, 2405...",214
7,head,"[2329504, 2325505, 2417733, 2398437, 2318141, ...",169
8,ground,"[2347788, 2318253, 2333971, 2318141, 2370306, ...",226
9,collar,"[2338858, 2407115, 2330169, 2356769, 2360082, ...",20


### Perform Search

Now, while performing search, we will have to take the search keywork (aka search tag) and convert it into a vector first. This vector will be compared with the vector embeddings that are stored in the database.

We already know the model ID (from the previous notebook), but let's capture it here explicitly in this notebook.

In [14]:
url = "http://localhost:9200/_plugins/_ml/models/_search"

headers = {'Content-Type': 'application/json'}

payload = {
    "query": {
        "match_all": {}
    },
    "size": 1
}

response = requests.post(url, headers=headers, data=json.dumps(payload))

model_id = response.json()['hits']['hits'][0]['_source']['model_id']
model_id

'CNlMZZQBuL5CuRNm06zQ'

This is the same model ID that we registered in the [second notebook](01_prepare_opensearch.ipynb).

Let's use this model as the `neural_query_enricher` in our pipeline.

In [15]:
url = "http://localhost:9200/_search/pipeline/default_model_pipeline"

payload = {
    "request_processors": [
        {
            "neural_query_enricher" : {
                "default_model_id": f"{model_id}",
            }
        }
    ]
}

response = requests.put(url, headers=headers, data=json.dumps(payload))

print(response.json())

{'acknowledged': True}


#### Trial search

Let's perform a trial search first using a keyword "circle". In other words, we would like search to return top five images that contain the tag circle (or something similar to it).

In [17]:
url = "http://localhost:9200/tags_db/_search"

payload = {
    "_source": {
        "excludes": [
            "tag_embedding"
        ]
    },
    "query": {
        "neural": {
            "tag_embedding": {
                "query_text": "circle",
                "model_id": f"{model_id}",
                "k": 5
            }
        }
    }
}

response = requests.get(url, headers=headers, data=json.dumps(payload))

print(response.json())

{'took': 25, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 5, 'relation': 'eq'}, 'max_score': 0.429884, 'hits': [{'_index': 'tags_db', '_id': '2359132', '_score': 0.429884, '_source': {'tags': 'surfboard balcony pipes railing bicycle seat pattern bicycle basket round disk ceiling toy doll plaque gloves sun pattern tan wood wall orange flower writing stripes signatures bunch ground middle mustache grapes fence circle board points grape'}}, {'_index': 'tags_db', '_id': '2338356', '_score': 0.42776644, '_source': {'tags': 'sign pole directions human top tree background bicycle basket'}}, {'_index': 'tags_db', '_id': '2317802', '_score': 0.41676605, '_source': {'tags': 'cap sand kid head kite wheel path spokes man bicycle area air dress leaves girl sunglasses can kites sky people bike old tree log beach'}}, {'_index': 'tags_db', '_id': '2379616', '_score': 0.41057575, '_source': {'tags': 'head leg emblem man trees arm h

In [18]:
response.json()['hits']['hits']

[{'_index': 'tags_db',
  '_id': '2359132',
  '_score': 0.429884,
  '_source': {'tags': 'surfboard balcony pipes railing bicycle seat pattern bicycle basket round disk ceiling toy doll plaque gloves sun pattern tan wood wall orange flower writing stripes signatures bunch ground middle mustache grapes fence circle board points grape'}},
 {'_index': 'tags_db',
  '_id': '2338356',
  '_score': 0.42776644,
  '_source': {'tags': 'sign pole directions human top tree background bicycle basket'}},
 {'_index': 'tags_db',
  '_id': '2317802',
  '_score': 0.41676605,
  '_source': {'tags': 'cap sand kid head kite wheel path spokes man bicycle area air dress leaves girl sunglasses can kites sky people bike old tree log beach'}},
 {'_index': 'tags_db',
  '_id': '2379616',
  '_score': 0.41057575,
  '_source': {'tags': 'head leg emblem man trees arm helmet car headlight light moped dots cyclist car stripe people bike shirt traffic lights person road island sign scooter design light pole bicycle wheel van

So we got five results back. The first image contains the tag "circle". The second one doesn't, but I guess it contains somewhat similar things, like sign and bycycle? We are using a very small model, so we shouldn't expect highly accurate results.

In [19]:
# we can also grab all the matched image IDs 
matched_image_ids = [i['_id'] for i in response.json()['hits']['hits']]
matched_image_ids

['2359132', '2338356', '2317802', '2379616', '2321976']

#### Perform search and collect results

In [20]:
# we will restrict the number of results to ten
num_search_results = 10

for i, row in df_judgement.iterrows():
    search_tag = row['search_tag']
    payload = {
        "_source": {
            "excludes": [
                "tag_embedding"
            ]
        },
        "query": {
            "neural": {
                "tag_embedding": {
                    "query_text": search_tag,
                    "model_id": f"{model_id}",
                    "k": num_search_results
                }
            }
        }
    }

    response = requests.get(url, headers=headers, data=json.dumps(payload))

    # grab all image IDs that matched
    matched_image_ids = [i['_id'] for i in response.json()['hits']['hits']]

    # update the judgement dataset
    df_judgement.loc[i, 'search_results'] = str(matched_image_ids)

In [21]:
df_judgement.head()

Unnamed: 0,search_tag,image_ids,matched_image_ct,search_results
0,wall,"[285966, 832, 512, 2399605, 2377478, 2390176, ...",208,"['2356720', '2391643', '2331561', '2396458', '..."
1,street light,"[832, 2377478, 2357116, 2369703, 285812, 23340...",13,"['2346143', '2325260', '2357116', '2376249', '..."
2,pole,"[2329504, 2405367, 2318141, 2323433, 2403833, ...",128,"['2413632', '2365776', '2361335', '2385842', '..."
3,shirt,"[2329504, 2347788, 2325505, 2417733, 2338858, ...",210,"['2338592', '2415636', '2317318', '2390763', '..."
4,building,"[2329504, 150439, 2378215, 2377478, 2323433, 2...",190,"['2867', '2392594', '2365776', '2391643', '186..."


Let's convert the `search_results` column into a list.

In [22]:
df_judgement["search_results"] = df_judgement["search_results"].apply(
    lambda x: list(map(int, ast.literal_eval(x))))

df_judgement.head()

Unnamed: 0,search_tag,image_ids,matched_image_ct,search_results
0,wall,"[285966, 832, 512, 2399605, 2377478, 2390176, ...",208,"[2356720, 2391643, 2331561, 2396458, 2392594, ..."
1,street light,"[832, 2377478, 2357116, 2369703, 285812, 23340...",13,"[2346143, 2325260, 2357116, 2376249, 2369703, ..."
2,pole,"[2329504, 2405367, 2318141, 2323433, 2403833, ...",128,"[2413632, 2365776, 2361335, 2385842, 2401000, ..."
3,shirt,"[2329504, 2347788, 2325505, 2417733, 2338858, ...",210,"[2338592, 2415636, 2317318, 2390763, 2356510, ..."
4,building,"[2329504, 150439, 2378215, 2377478, 2323433, 2...",190,"[2867, 2392594, 2365776, 2391643, 1869, 4341, ..."


### Evals

We can now compare the search results (matched images) with the ground truth.

In [23]:
# count the number of correctly identified images

df_judgement["common_count"] = df_judgement.apply(
    lambda row: len(set(row["image_ids"]) & set(row["search_results"])), axis=1)

df_judgement

Unnamed: 0,search_tag,image_ids,matched_image_ct,search_results,common_count
0,wall,"[285966, 832, 512, 2399605, 2377478, 2390176, ...",208,"[2356720, 2391643, 2331561, 2396458, 2392594, ...",7
1,street light,"[832, 2377478, 2357116, 2369703, 285812, 23340...",13,"[2346143, 2325260, 2357116, 2376249, 2369703, ...",5
2,pole,"[2329504, 2405367, 2318141, 2323433, 2403833, ...",128,"[2413632, 2365776, 2361335, 2385842, 2401000, ...",8
3,shirt,"[2329504, 2347788, 2325505, 2417733, 2338858, ...",210,"[2338592, 2415636, 2317318, 2390763, 2356510, ...",8
4,building,"[2329504, 150439, 2378215, 2377478, 2323433, 2...",190,"[2867, 2392594, 2365776, 2391643, 1869, 4341, ...",9
5,grass,"[2318253, 2325505, 2408241, 2370306, 2348450, ...",157,"[2365362, 2351034, 2380367, 2336529, 2347309, ...",10
6,sky,"[2366867, 2329504, 2417733, 2398437, 832, 2405...",214,"[2326381, 2329873, 2365362, 1593, 2369158, 233...",10
7,head,"[2329504, 2325505, 2417733, 2398437, 2318141, ...",169,"[2342714, 2332438, 2398437, 2317685, 2318141, ...",9
8,ground,"[2347788, 2318253, 2333971, 2318141, 2370306, ...",226,"[2330697, 2401032, 2338592, 2380367, 2336468, ...",6
9,collar,"[2338858, 2407115, 2330169, 2356769, 2360082, ...",20,"[2339493, 2390598, 2417994, 2325266, 2318379, ...",5


#### Calculate precision and recall

In [24]:
# precision: how many of the matched images are actually correct?
df_judgement['precision'] = df_judgement['common_count'] / num_search_results

# recall: how many of the correct images are captured by search
df_judgement[f'recall_at_{num_search_results}'] = df_judgement['common_count'] / df_judgement['matched_image_ct'] 

df_judgement.head()

Unnamed: 0,search_tag,image_ids,matched_image_ct,search_results,common_count,precision,recall_at_10
0,wall,"[285966, 832, 512, 2399605, 2377478, 2390176, ...",208,"[2356720, 2391643, 2331561, 2396458, 2392594, ...",7,0.7,0.033654
1,street light,"[832, 2377478, 2357116, 2369703, 285812, 23340...",13,"[2346143, 2325260, 2357116, 2376249, 2369703, ...",5,0.5,0.384615
2,pole,"[2329504, 2405367, 2318141, 2323433, 2403833, ...",128,"[2413632, 2365776, 2361335, 2385842, 2401000, ...",8,0.8,0.0625
3,shirt,"[2329504, 2347788, 2325505, 2417733, 2338858, ...",210,"[2338592, 2415636, 2317318, 2390763, 2356510, ...",8,0.8,0.038095
4,building,"[2329504, 150439, 2378215, 2377478, 2323433, 2...",190,"[2867, 2392594, 2365776, 2391643, 1869, 4341, ...",9,0.9,0.047368


#### Average percision and recall

In [27]:
print(f"Average Precision: {df_judgement['precision'].mean():.2f}")

print(f"Average Recall (at 10): {df_judgement['recall_at_10'].mean():.2f}")

Average Precision: 0.73
Average Recall (at 10): 0.08


The recall is very low, which shouldn't be surprising because we are returning only ten results from search. Increasing this number would improve the recall (and, most likely, hurt precision). In practice, a similarity threshold should be used to restrict the number of results instead of a static number (like ten).