## Building MultiModal Search with Vector Databases 

This notebook demonstrates how build multi-modal search (image, audio, video) `Meta AI ImageBind` model ([multi2vec-bind](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/multi2vec-bind)).

ImageBind allows us to search through text, images, audio and video files.

This recipe will focus on searching through image, audio and video:

* [text-to-media search](#text-to-media-search) - provide text as input to search through media
* [image-to-media search](#image-to-media-search) - provide image as input to search through media
* [audio-to-media search](#audio-to-media-search) - provide audio as input to search through media
* [video-to-media search](#video-to-media-search) - provide video as input to search through media

### Weaviate Setup

The ImageBind model is only available with local Weaviate deployments with Docker or Kubernetes.

ImageBind is not supported with Weaviate Cloud Services (WCS).

### Steps to deploy Weaviate locally with ImageBind

1. Locate a docker compose file.
    There is a prepared docker compose file at `/2-multimodal/docker-compose.yml`, which contains the necessary configuration to run Weaviate with `Meta's ImageBind` model.

    Navigate to the multimodal folder:
    ```
    cd 2-multimodal
    ```

2. Run Weaviate & ImageBind with Docker Compose

    > If you are new to `Docker Compose`, [here are instructions on how to install it](https://docs.docker.com/compose/install/).

    To start the docker image defined in the `docker-compose.yml` file, call:

    ```bash
    docker compose up
    ```
    
    > Note #1 - the first time you run the command, Docker will download a ~6GB image.
    
    > Note #2 – after the image is downloaded (or when we restart the image), it usually takes 30-60 seconds for the image to be ready.

    > Note #3 – to shut down a running docker image, press CMD+C or CTRL+C.


### Dependencies

    1. The Weaviate Python Client

In [None]:
! pip install --pre -I "weaviate-client==4.4.b1"

### Connect to Weaviate

In [None]:
import weaviate, os
import weaviate.classes as wvc

# Connect to your local Docker instance
# TODO
client = None

# And check if the connection is ready
client.is_ready()

In [None]:
client.get_meta()

### Create the `Animals` Collection

In [15]:
# Let's delete the collection in case you want to rerun this code
if(client.collections.exists("Animals")):
    client.collections.delete("Animals")

# Create a new collection with multi2vec bind
# use audio, image and video as the media fields
# TODO

<weaviate.collections.collection.Collection at 0x10a897350>

In [4]:
import base64

# Helper function to convert a file to base64 representation
def toBase64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')


### Insert Images into Weaviate

In [None]:
source = os.listdir("./source/image/")

items = list()

for name in source:
    
    print(f"Adding {name}")
    
    path = "./source/image/" + name
    
    items.append({
        "name": name,
        "path": path,
        "image": toBase64(path),
        "mediaType": "image"
    })

# TODO - connect get the animals collection
# and insert the images


# it is fine to insert all 9 items in one go
# but if we deal with bigger objects like videos or audio, then we should load them in smaller batches

In [None]:
# Check object count
animals = client.collections.get("Animals")
animals.aggregate.over_all()

### Insert Audio Files into Weaviate

In [None]:
animals = client.collections.get("Animals")

source = os.listdir("./source/audio/")

items = []
counter = 0

for name in source:
    print(f"Adding {name}")
    
    path = "./source/audio/" + name
    items.append({
        "name": name,
        "path": path,
        "audio": toBase64(path),
        "mediaType": "audio"
    })

    # TODO - load items in batches of 3, and clear

# TODO - Insert any remaining items
if (len(items) > 0):
    print(f"Inserting remaining ({len(items)}) items. Total counter: {counter}")
    # TODO add code here

In [None]:
# Check object count again
animals.aggregate.over_all()

### Insert Video Files into Weaviate

In [None]:
animals = client.collections.get("Animals")

source = os.listdir("./source/video/")

for name in source:
    print(f"Adding {name}")
    
    path = "./source/video/" + name
    item = {
        "name": name,
        "path": path,
        "video": toBase64(path),
        "mediaType": "video"
    }
    
    # Videos are big, so we should avoid inserting them in bulk
    # TODO: insert video objects one by one

In [137]:
# Check the object count again
animals.aggregate.over_all()

_AggregateReturn(properties={}, total_count=26)

In [None]:
# TODO - let's group by "mediaType" to see how many objects we have per each group


### Check all the media files added to the Vector Database

In [None]:
itr = animals.iterator(
    return_properties=["name", "mediaType"],
    # include_vector=True, # in case you want to see the vectors
)

for item in itr:
    print(item.properties)

# Multimodal Search

> Note, this usually should exist in a separe file/flow, as we shouldn't keep collection creation (which is usually a one-off process) and data import – together with queries (this is usually what you use on a daily basis). As to avoid recreating the collection and importing the data.

## Helper functions

In [15]:
# Helper functions to display results
import json
from IPython.display import Image, Audio, Video

def json_print(data):
    print(json.dumps(data, indent=2))

def display_media(item):
    path = item["path"]

    if(item["mediaType"] == "image"):
        display(Image(path))

    elif(item["mediaType"] == "video"):
        display(Video(path))
        
    elif(item["mediaType"] == "audio"):
        display(Audio(path))

In [16]:
import base64, requests

# Helper function – get base64 representation from an online image
def url_to_base64(url):
    image_response = requests.get(url)
    content = image_response.content
    return base64.b64encode(content).decode('utf-8')

# Helper function - get base64 representation from a local file
def file_to_base64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')

# Update the url and path to test
#test_image_base64 = url_to_base64("https://path-to-some-online-image.jpg")
#test_file_base64 = file_to_base64("./test/meerkat.jpeg")

<a id='text-to-media-search'></a>
### Text to Media Search

In [13]:
# TODO: use near text to search for "dog with stick" and return 3 items
# TODO: make sure we capture the result as "response" 
# response = animals.query.

In [None]:
# print results
for obj in response.objects:
    json_print(obj.properties)
    display_media(obj.properties)

<a id='image-to-media-search'></a>
### Image to Media Search

In [None]:
Image("./test/test-cat.jpg")

In [18]:
# TODO: use near image to search with "./test/test-cat.jpg" and return 3 items
# HINT: use toBase64 to convert the file to base64
#    or use Path("./your-file-path-here"),
# response = animals.query.

In [None]:
for obj in response.objects:
    json_print(obj.properties)
    display_media(obj.properties)

<a id='audio-to-media-search'></a>
### Audio to Media Search

In [None]:
Audio("./test/dog_audio.wav")

In [20]:
# TODO: use near audio to search with "./test/dog_audio.wav" and return 3 items
# HINT: use toBase64 to convert the file to base64

# response = animals.query.near_audio(
# )

In [None]:
for obj in response.objects:
    json_print(obj.properties)
    display_media(obj.properties)

<a id='video-to-media-search'></a>
### Video to Media Search

In [None]:
Video("./test/test-meerkat.mp4")

In [150]:
# TODO: use near audio to search with "./test/meerkat.mp4" and return 3 items
# HINT: use toBase64 to convert the file to base64

# response = animals.query.near_video(
# )

In [None]:
for obj in response.objects:
    json_print(obj.properties)
    display_media(obj.properties)