# Module 2: Image Search

In this module, we want to make the embedding vector of images. So we need a multi-model model to generate the embedding vector. The [Titan Multimodal Embeddings G1](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html) in Amazon Bedrock is a great choice. We will use `Titan Multimodal Embeddings G1` and TiDB Serverless Vector Search to complete this module.

We will use the embedding model to encode the image to a  vector and store them in TiDB Serverless. Then use the same model to encode the text query and search for the most similar images in TiDB Serverless.

## Install dependencies


In [1]:
%pip install -q \
    pytidb==0.0.10.dev1 \
    boto3==1.38.23 \
    litellm \
    ipyplot

In [2]:
import dotenv

dotenv.load_dotenv()

## Download dataset

In [3]:
import os
import requests
import ipyplot

os.makedirs("pet_images", exist_ok=True)

pet_images = [
    "scottish_terrier_166.jpg",
    "scottish_terrier_161.jpg",
    "shiba_inu_15.jpg",
    "shiba_inu_16.jpg",
]

base_url = "https://raw.githubusercontent.com/pingcap/pytidb/main/tests/fixtures/pet_images/"

local_image_paths = []
for img in pet_images:
    url = base_url + img
    local_path = os.path.join("pet_images", img)
    with open(local_path, "wb") as f:
        f.write(requests.get(url).content)
    local_image_paths.append(local_path)

ipyplot.plot_images(local_image_paths, labels=pet_images, max_images=10, img_width=200, force_b64=True)

## Initial the Database and Table

> **Note:**
>
> - We already set the `SERVERLESS_CLUSTER_HOST`, `SERVERLESS_CLUSTER_PORT`, `SERVERLESS_CLUSTER_USERNAME`, `SERVERLESS_CLUSTER_PASSWORD`, and `SERVERLESS_CLUSTER_DATABASE_NAME` in the environment parameters.
> - We also granted the permission of using Amazon Bedrock for this lab. If you want to use this code snippet out of TiDB Labs platform, please set them beforehand.

## Initial Multi-model Embedding Model

In [4]:
import os

from litellm import completion
from typing import Optional, Any
from pytidb import TiDBClient
from pytidb.schema import TableModel, Field
from pytidb.embeddings import EmbeddingFunction

embedding_model = "bedrock/amazon.titan-embed-image-v1"
llm_model = "bedrock/us.amazon.nova-lite-v1:0"

multimodel_embedding_function = EmbeddingFunction(
    embedding_model,
    timeout=60
)

In [5]:
db = TiDBClient.connect(
    host=os.getenv("SERVERLESS_CLUSTER_HOST"),
    port=int(os.getenv("SERVERLESS_CLUSTER_PORT")),
    username=os.getenv("SERVERLESS_CLUSTER_USERNAME"),
    password=os.getenv("SERVERLESS_CLUSTER_PASSWORD"),
    database=os.getenv("SERVERLESS_CLUSTER_DATABASE_NAME"),
    enable_ssl=True,
)

table_name = "image_search"
class ImageSearch(TableModel, table=True):
    __tablename__ = table_name
    __table_args__ = {"extend_existing": True}
    id: int | None = Field(default=None, primary_key=True)
    image_uri: str = Field()
    image_vec: list[float] = multimodel_embedding_function.VectorField(
        source_field="image_uri",
        source_type="image"
    )
table = db.create_table(schema=ImageSearch, if_exists="overwrite")

## Insert images

## Store the images and their corresponding image embeddings in TiDB Serverless

In [6]:
from pathlib import Path

table.bulk_insert([
    ImageSearch(image_uri = Path(local_image_path)) for local_image_path in local_image_paths
])

## Search for similar images using the text query

In [9]:
results = table.search(query="shiba inu").limit(2).to_list()

result_image_paths = [result["image_uri"] for result in results]

ipyplot.plot_images(result_image_paths, img_width=200, force_b64=True)