# Compare pre-trained CLIP models for text-image retrieval
> Create, deploy, feed and evaluate the Vespa app using the Vespa python API 

## Install required packages

In [None]:
!pip install -r requirements.txt

In [None]:
%env IMG_DIR=.

## CLIP model

There are multiple CLIP model variations

In [None]:
import clip

clip.available_models()

Each model might have a different embedding size. We need this information when creating the schema of a text-image search application.

In [None]:
embedding_info = {name: clip.load(name)[0].visual.output_dim for name in clip.available_models()}
embedding_info

{'RN50': 1024,
 'RN101': 512,
 'RN50x4': 640,
 'RN50x16': 768,
 'RN50x64': 1024,
 'ViT-B/32': 512,
 'ViT-B/16': 512,
 'ViT-L/14': 768,
 'ViT-L/14@336px': 768}

## Create and deploy a text-image search app

### Create the Vespa application package

The function `create_text_image_app` below uses [the Vespa python API](https://pyvespa.readthedocs.io/en/latest/) to create an application package with fields to store each of the six different types of image embedding associated with the CLIP models. It also declares the types of the text embeddings that we are going to send along with the query when searching for images, and creates one ranking profile for each (text, image) embedding model.

In [None]:
from embedding import create_text_image_app

app_package = create_text_image_app(embedding_info)

We can inspect how the `schema` of the resulting application package looks like:

In [None]:
print(app_package.schema.schema_to_text)

### Deploy

In [None]:
import os
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()
app = vespa_docker.deploy(application_package=app_package)

## Compute and feed image embeddings

For each of the CLIP models, compute the image embeddings and send it to the Vespa app.

In [None]:
from embedding import compute_and_send_image_embeddings

compute_and_send_image_embeddings(app=app, batch_size=128, clip_model_names=clip.available_models())

## Define QueryModel's to be evaluated

Create one `QueryModel` for each of the CLIP models. In order to do that, we need to have a function that takes a query as input and outputs the body function of a Vespa query request. Here is an example:

In [None]:
from embedding import create_vespa_query_body_function

vespa_query_body_function = create_vespa_query_body_function("RN50")
vespa_query_body_function("this is a test query")["yql"] # There are more key, value pairs not shown to unclutter

'select * from sources * where ({"targetNumHits":100}nearestNeighbor(rn50_image,rn50_text))'

With a method to create Vespa query body functions, we can create `QueryModel`s that will be used to evaluate each search configuration that is to be tested. In this case, each query model will represent a CLIP model text-image representation.

In [None]:
from learntorank.query import QueryModel

query_models = [QueryModel(
    name=model_name, 
    body_function=create_vespa_query_body_function(model_name)
) for model_name in clip.available_models()]

A query model contains all the information that is necessary to define how the search app will match and rank documents. We can use it to query the application.

In [None]:
from embedding import plot_images
from learntorank.query import send_query

query_result = send_query(app, query="a man surfing", query_model=query_models[3], hits = 4)
plot_images(query_result, os.environ["IMG_DIR"])

<Figure size 1000x1000 with 0 Axes>

## Evaluate

Now that there is one QueryModel for each CLIP model available, it is posible to evaluate and compare them. 

Define search evaluation metrics:

In [None]:
from learntorank.evaluation import MatchRatio, Recall, ReciprocalRank

eval_metrics = [
    MatchRatio(), # Match ratio is just to show the % of documents that are matched by ANN
    Recall(at=100), 
    ReciprocalRank(at=100)
]

Load labeled data. It was assumed that a (caption, image) pair is relevant if all three experts agreed that the caption accurately described the image.

In [None]:
from pandas import read_csv

labeled_data = read_csv("https://data.vespa.oath.cloud/blog/flickr8k/labeled_data.csv", sep = "\t")
labeled_data.head()

Evaluate the application and return per query results.

In [None]:
from learntorank.evaluation import evaluate

result = evaluate(
    app=app,
    labeled_data=labeled_data, 
    eval_metrics=eval_metrics, 
    query_model=query_models, 
    id_field="image_file_name",
    per_query=True
)
result.head()

Visualize RR@100:

In [None]:
import plotly.express as px
fig = px.box(result, x="model", y="reciprocal_rank_100")
fig.show()

Compute mean and median across models:

In [None]:
result[["model", "reciprocal_rank_100"]].groupby(
    "model"
).agg(
    Mean=('reciprocal_rank_100', 'mean'), 
    Median=('reciprocal_rank_100', 'median')
)