# Build a Milvus Powered Text-Image Search Engine in Minutes

This notebook illustrates how to build an text-image search engine from scratch using [Milvus](https://milvus.io/). Milvus is the most advanced open-source vector database built for AI applications and supports nearest neighbor embedding search across tens of millions of entries. We'll go through text-image search procedures and evaluate the performance. Moreover, we managed to make the core functionality as simple as a dozen lines of code, with which you can start hacking your own image search engine.

## Preparation
### Install Dependencies
First we need to install dependencies such as pymilvus, towhee, gradio and opencv-python.

In [6]:
! python -m pip install -q pymilvus towhee gradio opencv-python

### Prepare the data

The dataset used in this demo is a subset of the ImageNet dataset (100 classes, 10 images for each class), and the dataset is available via [Github](https://github.com/towhee-io/examples/releases/download/data/reverse_image_search.zip). 

The dataset is organized as follows:
- **train**: directory of candidate images;
- **test**: directory of test images;
- **reverse_image_search.csv**: a csv file containing an ***id***, ***path***, and ***label*** for each image;

Let's take a quick look:

In [1]:
! curl -L https://github.com/towhee-io/examples/releases/download/data/reverse_image_search.zip -O
! unzip -q -o reverse_image_search.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  119M  100  119M    0     0  1950k      0  0:01:02  0:01:02 --:--:-- 2905k393k


In [2]:
import pandas as pd

df = pd.read_csv('reverse_image_search.csv')
df.head()

Unnamed: 0,id,path,label
0,0,./train/brain_coral/n01917289_1783.JPEG,brain_coral
1,1,./train/brain_coral/n01917289_4317.JPEG,brain_coral
2,2,./train/brain_coral/n01917289_765.JPEG,brain_coral
3,3,./train/brain_coral/n01917289_1079.JPEG,brain_coral
4,4,./train/brain_coral/n01917289_2484.JPEG,brain_coral


To use the dataset for text-image search, let's first define some helper function:

- **read_images(results)**: read images by image IDs;

In [3]:
import cv2
from towhee.types.image import Image

id_img = df.set_index('id')['path'].to_dict()
def read_images(results):
    imgs = []
    for path in results:
        imgs.append(Image(cv2.imread(path)[:,:,::-1], 'RGB'))
    return imgs


### Create a Milvus Collection

Before getting started, please make sure you have [installed milvus](https://milvus.io/docs/v2.0.x/install_standalone-docker.md). Let's first create a `text_image_search` collection that uses the [L2 distance metric](https://milvus.io/docs/v2.0.x/metric.md#Euclidean-distance-L2) and an [IVF_FLAT index](https://milvus.io/docs/v2.0.x/index.md#IVF_FLAT).

In [10]:
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

def create_milvus_collection(collection_name, dim):
    connections.connect(host='127.0.0.1', port='19530')
    
    if utility.has_collection(collection_name):
        utility.drop_collection(collection_name)
    
    fields = [
    FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=False),
    FieldSchema(name='url', dtype=DataType.VARCHAR, descrition='url of image path', max_length=200),
    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim)
    ]
    schema = CollectionSchema(fields=fields, description='text image search')
    collection = Collection(name=collection_name, schema=schema)

    # create IVF_FLAT index for collection.
    index_params = {
        'metric_type':'L2',
        'index_type':"IVF_FLAT",
        'params':{"nlist":512}
    }
    collection.create_index(field_name="embedding", index_params=index_params)
    return collection

collection = create_milvus_collection('text_image_search', 512)

## Text Image Search

In this section, we'll show how to build our text-image search engine using Milvus. The basic idea behind our text-image search is the extract embeddings from images and texts using a deep neural network and compare the embeddings with those stored in Milvus.

We use [Towhee](https://towhee.io/), a machine learning framework that allows for creating data processing pipelines, and it also provides predefined operators which implement insert and query operation in Milvus.

<img src="./workflow.png" width = "60%" height = "60%" align=center />

### Generate image and text embeddings with CLIP


This operator extracts features for image or text with [CLIP](https://openai.com/blog/clip/) which can generate embeddings for text and image by jointly training an image encoder and text encoder to maximize the cosine similarity.

In [7]:
from towhee.dc2 import pipe, ops, DataCollection

In [2]:
img_pipe = (
    pipe.input('url')
    .map('url', 'img', ops.image_decode.cv2('RGB'))
    .map('img', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch16', modality='image'))
    .output('img', 'vec')
)

DataCollection(img_pipe('./teddy.png')).show()

img,vec
,"[0.39526263, -0.7003882, -0.115270466, ...] shape=(512,)"


In [12]:
text_pipe = (
    pipe.input('text')
    .map('text', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch16', modality='text'))
    .output('text', 'vec')
)

DataCollection(text_pipe('A teddybear on a skateboard in Times Square.')).show()

text,vec
A teddybear on a skateboard in Times Square.,"[-0.061550085, 0.1925929, -0.0052626375, ...] shape=(512,)"


Here is detailed explanation of the code:

- `ops.image_decode.cv2()`: for each row from the data, read and decode the image at `url` and put the pixel data into column `img`;

- `ops.image_text_embedding.clip(model_name='clip_vit_base_patch16', modality='image'/‘text’)`: extract image or text embedding feature with `image_text_embedding.clip`, an operator from the [Towhee hub](https://towhee.io/towhee/clip) . This operator supports seveal models including `clip_vit_base_patch16`,`clip_vit_base_patch32`,`clip_vit_large_patch14`,`clip_vit_large_patch14_336`,etc.

### Load Image Embeddings into Milvus

We first extract embeddings from images with `clip_vit_base_patch16` model and insert the embeddings into Milvus for indexing. 

Here is detailed explanation for other apis of the code:

- `towhee.read_csv('reverse_image_search.csv')`: read tabular data from csv file (`id`, `path` and `label` columns);

- `.runas_op['id', 'id'](func=lambda x: int(x))`: for each row from the data, convert the data type of the column `id` from `str` to `int`;

- `.ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='text_image_search'))`: insert image embedding features in to Milvus;

In [11]:
import csv
import numpy as np

collection = create_milvus_collection('text_image_search', 512)
collection.load()
def csv_reader(csv_path):
    with open('./reverse_image_search.csv') as f:
        reader = csv.reader(f)
        next(reader)
        for item in reader:
           yield int(item[0]), item[1]

insert_pipe = (
    pipe.input('csv_path')
    .flat_map('csv_path', ('id', 'url'), csv_reader)
    .map('url', 'image', ops.image_decode.cv2())
    .map('image', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch16', modality='image'), config={'device': 0})
    .map('vec', 'vec', lambda x: x / np.linalg.norm(x))
    .map(('id', 'url', 'vec'), (), ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='text_image_search'))
    .output()
)
insert_pipe('reverse_image_search.csv')

<towhee.runtime.data_queue.DataQueue at 0x7f38eb134ac0>

In [14]:
print('Total number of inserted data is {}.'.format(collection.num_entities))

Total number of inserted data is 1000.


### Query Matched Images from Milvus

Now that embeddings for candidate images have been inserted into Milvus, we can query across it for nearest neighbors. Again, we use Towhee to load the input Text, compute an embedding vector, and use the vector as a query for Milvus. Because Milvus only outputs image IDs and distance values, we provide a `read_images` function to get the original image based on IDs and display.

In [13]:
from pathlib import Path
p_search = (
    pipe.input('text')
        .map('text', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch16', modality='text'), config={'device': 0})
        .map('vec', 'vec', lambda x: x / np.linalg.norm(x))
        .map('vec', ('search_res'), ops.ann_search.milvus_client( host='127.0.0.1', port='19530', limit=5, collection_name="text_image_search", output_fields=['url']))
        .map('search_res', 'pred', lambda x: [str(Path(y[2]).resolve()) for y in x])
        .map('pred', 'pred_imgs', lambda x: read_images(x))
        .output('text','pred_imgs')
)

DataCollection(p_search('A black dog.')).show()

text,pred_imgs
A black dog.,


## Release a Showcase

We've done an excellent job on the core functionality of our text-image search engine. Now it's time to build a showcase with interface. [Gradio](https://gradio.app/) is a great tool for building demos. With Gradio, we simply need to wrap the data processing pipeline via a `milvus_search_function` function:

In [89]:
import gradio

def milvus_search_function(query):
    return p_search(query).get_dict()['pred_imgs']

interface = gradio.Interface(milvus_search_function, 
                             gradio.inputs.Textbox(lines=1),
                             [gradio.outputs.Image(type="file", label=None) for _ in range(5)]
                            )

interface.launch(inline=True, share=True)

DataCollection(p_search('A black dog.')).show()

IMPORTANT: You are using gradio version 3.3.1, however version 3.14.0 is available, please upgrade.
--------
Running on local URL:  http://127.0.0.1:7869
Running on public URL: https://bb191e2ae2474f33.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces: https://www.huggingface.co/spaces


text,pred_imgs
A black dog.,
