## Building MultiModal Search with Vector Databases 

This notebook demonstrates how build multi-modal search (image, audio, video) `Meta AI ImageBind` model ([multi2vec-bind](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/multi2vec-bind)).

ImageBind allows us to search through text, images, audio and video files.

This recipe will focus on searching through image, audio and video:

* [text-to-media search](#text-to-media-search) - provide text as input to search through media
* [image-to-media search](#image-to-media-search) - provide image as input to search through media
* [audio-to-media search](#audio-to-media-search) - provide audio as input to search through media
* [video-to-media search](#video-to-media-search) - provide video as input to search through media

### Weaviate Setup

The ImageBind model is only available with local Weaviate deployments with Docker or Kubernetes.

ImageBind is not supported with Weaviate Cloud Services (WCS).

### Steps to deploy Weaviate locally with ImageBind

1. Locate a docker compose file.
    There is a prepared docker compose file at `/2-multimodal/docker-compose.yml`, which contains the necessary configuration to run Weaviate with `Meta's ImageBind` model.

    Navigate to the multimodal folder:
    ```
    cd 2-multimodal
    ```

2. Run Weaviate & ImageBind with Docker Compose

    > If you are new to `Docker Compose`, [here are instructions on how to install it](https://docs.docker.com/compose/install/).

    To start the docker image defined in the `docker-compose.yml` file, call:

    ```bash
    docker compose up
    ```
    
    > Note #1 - the first time you run the command, Docker will download a ~6GB image.
    
    > Note #2 – after the image is downloaded (or when we restart the image), it usually takes 30-60 seconds for the image to be ready.

    > Note #3 – to shut down a running docker image, press CMD+C or CTRL+C.


### Dependencies

    1. The Weaviate Python Client

In [None]:
! pip install --pre -I "weaviate-client==4.4.0"

### Connect to Weaviate

In [None]:
import weaviate, os

client = weaviate.connect_to_local()

client.is_ready()

In [None]:
client.get_meta()

### Create the `Animals` Collection

In [None]:
import weaviate.classes as wvc

if(client.collections.exists("Animals")):
    client.collections.delete("Animals")

client.collections.create(
    name="Animals",
    vectorizer_config=wvc.config.Configure.Vectorizer.multi2vec_bind(
        audio_fields=["audio"],
        image_fields=["image"],
        video_fields=["video"],
    )
)

In [4]:
import base64

# Helper function to convert a file to base64 representation
def toBase64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')


### Insert Images into Weaviate

In [None]:
animals = client.collections.get("Animals")

source = os.listdir("./source/image/")
items = list()

for name in source:
    
    print(f"Adding {name}")
    
    path = "./source/image/" + name
    
    items.append({
        "name": name,            # name of the file
        "path": path,            # path to the file to display result
        "image": toBase64(path), # this gets vectorized - "image" was configured in vectorizer_config as the property holding images
        "mediaType": "image",    # a label telling us how to display the resource 
    })

    # import images in batches of 5
    if (len(items) > 5):
        print(f"Inserting 5 new image objects.")
        animals.data.insert_many(items)
        items.clear()

# Insert any remaining items
if (len(items) > 0):
    print(f"Inserting remaining ({len(items)}) items.")
    animals.data.insert_many(items)

In [None]:
#Object count
animals = client.collections.get("Animals")
animals.aggregate.over_all()

### Insert Audio Files into Weaviate

In [None]:
animals = client.collections.get("Animals")

source = os.listdir("./source/audio/")
items = list()

for name in source:
    print(f"Adding {name}")
    
    path = "./source/audio/" + name
    items.append({
        "name": name,
        "path": path,
        "audio": toBase64(path),
        "mediaType": "audio"
    })

    # import images in batches of 3
    if(len(items) == 3):
        print(f"Inserting 3 new audio objects.")
        animals.data.insert_many(items)
        items.clear()

# Insert any remaining items
if (len(items) > 0):
    print(f"Inserting remaining ({len(items)}) items.")
    animals.data.insert_many(items)

In [None]:
animals.aggregate.over_all()

### Insert Video Files into Weaviate

In [None]:
animals = client.collections.get("Animals")

source = os.listdir("./source/video/")

for name in source:
    print(f"Adding {name}")
    
    path = "./source/video/" + name
    item = {
        "name": name,
        "path": path,
        "video": toBase64(path),
        "mediaType": "video"
    }
    
    # insert videos one by one
    animals.data.insert(item)

In [None]:
animals.aggregate.over_all()

In [None]:
agg = animals.aggregate.over_all(
    group_by="mediaType"
)

for group in agg.groups:
    print(group)


### Check all the media files added to the Vector Database

In [None]:
itr = animals.iterator(
    return_properties=["name", "mediaType"],
    # include_vector=True, # in case you want to see the vectors
)

for item in itr:
    print(item.properties)

# Multimodal Search
## Helper functions

In [15]:
# Helper functions to display results
import json
from IPython.display import Image, Audio, Video

def json_print(data):
    print(json.dumps(data, indent=2))

def display_media(item):
    path = item["path"]

    if(item["mediaType"] == "image"):
        display(Image(path))

    elif(item["mediaType"] == "video"):
        display(Video(path))
        
    elif(item["mediaType"] == "audio"):
        display(Audio(path))

In [26]:
import base64, requests

# Helper function – get base64 representation from an online image
def url_to_base64(url):
    image_response = requests.get(url)
    content = image_response.content
    return base64.b64encode(content).decode('utf-8')

# Helper function - get base64 representation from a local file
def file_to_base64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')

# Update the url and path to test
#test_image_base64 = url_to_base64("https://path-to-some-online-image.jpg")
#test_file_base64 = file_to_base64("./test/meerkat.jpeg")

<a id='text-to-media-search'></a>
### Text to Media Search

In [13]:
response = animals.query.near_text(
    query="dog with stick",
    return_properties=['name','path','mediaType'],
    limit=3
)

<a id='image-to-media-search'></a>
### Image to Media Search

In [None]:
Image("./test/test-cat.jpg")

In [18]:
response = animals.query.near_image(
    near_image=toBase64("./test/test-cat.jpg"),
    return_properties=['name','path','mediaType'],
    limit=3
)

In [None]:
for obj in response.objects:
    json_print(obj.properties)
    display_media(obj.properties)

<a id='audio-to-media-search'></a>
### Audio to Media Search

In [None]:
Audio("./test/dog_audio.wav")

In [20]:
import weaviate.classes.query as wq

response = animals.query.near_media(
    media=toBase64("./test/dog_audio.wav"),
    media_type=wq.NearMediaType.AUDIO,
    return_properties=['name','path','mediaType'],
    limit=3
)

In [None]:
for obj in response.objects:
    json_print(obj.properties)
    display_media(obj.properties)

<a id='video-to-media-search'></a>
### Video to Media Search

In [None]:
Video("./test/test-meerkat.mp4")

In [150]:
response = animals.query.near_media(
    media=toBase64("./test/test-meerkat.mp4"),
    media_type=wq.NearMediaType.VIDEO,
    return_properties=['name','path','mediaType'],
    limit=3
)