# Image Audio Video Search with Meta AI ImageBind
This recipe demonstrates how build multi-modal search (image, audio, video) `Meta AI ImageBind` model ([multi2vec-bind](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/multi2vec-bind)).

ImageBind allows us to search through text, image, audio and video.

This recipe will focus on searching through image, audio, video (skipping searching through text):
* [text-to-media search](#text-to-media-search) - provide text as input to search through media
* [image-to-media search](#image-to-media-search) - provide image as input to search through media

## Weaviate Setup

The ImageBind model is only available with local Weaviate deployments with Docker or Kubernetes.

ImageBind is not supported with Weaviate Cloud Services (WCS).

### Steps to deploy Weaviate locally with CLIP

1. Get a docker compose file.
    
    Run the following command in your terminal:

    ```
    curl -o docker-compose.yml "https://configuration.weaviate.io/v2/docker-compose/docker-compose.yml?bind_model=imagebind&generative_cohere=false&generative_openai=false&generative_palm=false&media_type=bind&modules=modules&ref2vec_centroid=false&reranker_cohere=false&reranker_transformers=false&runtime=docker-compose&weaviate_version=v1.21.8&weaviate_volume=named-volume"
    ```

    This will download `docker-compose.yml` file for you.

2. Run Weaviate+Bind with Docker Compose

    > If you are new to `Docker Compose`, [here are instructions on how to install it](https://docs.docker.com/compose/install/).

    To start the docker image defined in the `docker-compose.yml` file, call:

    ```
    docker compose up
    ```
    
    > Note #1 - the first time you run the command, Docker will download a ~6GB image.
    
    > Note #2 – to shut down a running docker image, press CMD+C or CTRL+C.

### Dependencies

In [None]:
!pip install weaviate-client

## Configuration

In [2]:
import weaviate

# Connect to Weaviate
client = weaviate.Client(
  url="http://localhost:8080",  # URL to your local Weaviate instance
)

client.is_ready() # Test the connection

True

### Create `Animals` collection

The collection has the following key characteristics:
1. Name: `"Animals"`
2. Vectorizer: `multi2vec-clip`
3. Image property: `"img"` - Weaviate will use values in "img" property to generate vectors. Note, you can call it anything you want.

In [63]:
# Delete the collection if it exists.
# Note you should skip this step if you don't want to reimport the data every time.
if client.schema.exists("Animals"):
    client.schema.delete_class("Animals")

animals = {
    "classes": [
        {
            "class": "Animals",
            "vectorizer": "multi2vec-bind",
            "moduleConfig": {
                "multi2vec-bind": {
                    "textFields": ["name"],
                    "imageFields": ["image"],
                    "audioFields": ["audio"],
                    "videoFields": ["video"],
                }
            },
        }
    ]
}

client.schema.create(animals)
print("Successfully created Animals collection.")

Successfully created Animals collection.


### Import Media
For every object, we will store:
* `name` - the file name 
* `path` - path to the file, so that we could display returned images at query time.
* (one of the following) media:
    * `image` - a base64 representation of the image file, Weaviate will use it to generate a vector - see `imageFields`.
    * `audio` - a base64 representation of the audio file, Weaviate will use it to generate a vector - see `audioFields`.
    * `video` - a base64 representation of the video file, Weaviate will use it to generate a vector - see `videoFields`.


In [5]:

import base64

# Helper function to convert a file to base64 representation
def toBase64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')

#### Import images

In [64]:
# List of source images 
source = ["cat1.jpg", "cat2.jpg", "cat3.jpg",
          "dog1.jpg", "dog2.jpg", "dog3.jpg",
          "meerkat1.jpg", "meerkat2.jpg", "meerkat3.jpg"]

client.batch.configure(batch_size=3)  # Load images in batches of 3
with client.batch as batch:

    for name in source:
        print(f"Adding {name}")

        # Build the path to the image file
        path = "./source/image/" + name

        # Object to store in Weaviate
        properties = {
            "name": name,
            "path": path,
            "image": toBase64(path), # Weaviate will use the base64 representation of the file to generate a vector.
            "mediaType": "image"
        }

        # Add the object to Weaviate
        client.batch.add_data_object(properties, "Animals")

Adding cat1.jpg
Adding cat2.jpg
Adding cat3.jpg
Adding dog1.jpg
Adding dog2.jpg
Adding dog3.jpg
Adding meerkat1.jpg
Adding meerkat2.jpg
Adding meerkat3.jpg


In [None]:
# client.batch.delete_objects(
#     class_name='Animals',
#     where={
#         'path': ['mediaType'],
#         'operator': 'Equal',
#         'valueText': 'Video'
#     },
# )

#### Import Audio

### TODO
1. Add 4 audio files to `source/audio`
2. Test import
3. Then test the [audio-to-media-search](#audio-to-media-search)

In [None]:
# List of source images 
source = [
    # "cat-A.mp3", "cat-B.mp3",
    # "dog-A.mp3", "dog-B.mp3",
]

client.batch.configure(batch_size=3)  # Load images in batches of 1, as these might be big files
with client.batch as batch:

    for name in source:
        print(f"Adding {name}")

        # Build the path to the image file
        path = "./source/audio/" + name

        # Object to store in Weaviate
        properties = {
            "name": name,
            "path": path,
            "audio": toBase64(path), # Weaviate will use the base64 representation of the file to generate a vector.
            "mediaType": "audio"
        }

        # Add the object to Weaviate
        client.batch.add_data_object(properties, "Animals")

#### Import Video

In [67]:
# List of source images 
source = [
    "cat-clean.mp4", "cat-play.mp4",
    "dog-high-five.mp4", "dog-with-stick.mp4",
    "meerkat-dig.mp4", "meerkat-watch.mp4"
]

client.batch.configure(batch_size=1)  # Load images in batches of 1, as these might be big files
with client.batch as batch:

    for name in source:
        print(f"Adding {name}")

        # Build the path to the image file
        path = "./source/video/" + name

        # Object to store in Weaviate
        properties = {
            "name": name,
            "path": path,
            "video": toBase64(path), # Weaviate will use the base64 representation of the file to generate a vector.
            "mediaType": "video"
        }

        # Add the object to Weaviate
        client.batch.add_data_object(properties, "Animals")

Adding meerkat-dig.mp4
Adding meerkat-watch.mp4


### Check number of objects in the Animals collection

In [27]:
# Display the number of objects in the Animals collection
client.query.aggregate("Animals").with_meta_count().do()

{'data': {'Aggregate': {'Animals': [{'meta': {'count': 10}}]}}}

## Query examples

In [68]:
# Helper functions to display results
import json
from IPython.display import Image, Audio, Video

def json_print(data):
    print(json.dumps(data, indent=2))

def display_media(item):
    path = item["path"]

    if(item["mediaType"] == "image"):
        display(Image(path))

    elif(item["mediaType"] == "video"):
        display(Video(path))
        
    elif(item["mediaType"] == "audio"):
        display(Audio(path))

### Text to Media search

In [75]:
# Search for media with "dog with stick", "cat playing with mouse", "dog high five", "puppy"
response = (
    client.query
    .get("Animals", "name path mediaType")
    .with_near_text(
        {"concepts": "dog with stick"}
        # {"concepts": "cat playing with mouse"}
        # {"concepts": "dog high five"}
        # {"concepts": "puppy"}
    )
    .with_limit(3)
    .do()
)

# Print results
result = response["data"]["Get"]["Animals"]

json_print(result)

# Display the first result
display_media(result[0])

[
  {
    "mediaType": "video",
    "name": "dog-with-stick.mp4",
    "path": "./source/video/dog-with-stick.mp4"
  },
  {
    "mediaType": "image",
    "name": "dog2.jpg",
    "path": "./source/image/dog2.jpg"
  },
  {
    "mediaType": "image",
    "name": "dog1.jpg",
    "path": "./source/image/dog1.jpg"
  }
]


### Image to Media search

In [72]:
# Search for images that are similar to the provided image of test-meerkat, test-dog, test-cat
response = (
    client.query
    .get("Animals", "name path mediaType")
    .with_near_image(
        # {"image": "./test/test-meerkat.jpg"}, # Use file path as the input for the query
        # {"image": "./test/test-dog.jpg"}, # Use file path as the input for the query
        {"image": "./test/test-cat.jpg"}, # Use file path as the input for the query
    )
    .with_limit(5)
    .do()
)

# Print results
result = response["data"]["Get"]["Animals"]
json_print(result)

# Display the first image
display_media(result[0])

[
  {
    "mediaType": "image",
    "name": "cat1.jpg",
    "path": "./source/image/cat1.jpg"
  },
  {
    "mediaType": "image",
    "name": "cat2.jpg",
    "path": "./source/image/cat2.jpg"
  },
  {
    "mediaType": "image",
    "name": "cat3.jpg",
    "path": "./source/image/cat3.jpg"
  },
  {
    "mediaType": "video",
    "name": "cat-clean.mp4",
    "path": "./source/video/cat-clean.mp4"
  },
  {
    "mediaType": "video",
    "name": "cat-play.mp4",
    "path": "./source/video/cat-play.mp4"
  }
]


### Audio to Media search

### TODO:
1. Add `test-cat.mp3`, `test-dog.mp3` (or `.wav`) to `source/audio`
2. Test each audio input works

In [None]:
# Search for images that are similar to the provided image of test-meerkat, test-dog, test-cat
response = (
    client.query
    .get("Animals", "name path mediaType")
    .with_near_audio(
        {"audio": "./test/test-cat.mp3"}, # Use file path as the input for the query
        # {"audio": "./test/test-dog.mp3"}, # Use file path as the input for the query
        # {"audio": "./test/test-meerkat.mp3"}, # Use file path as the input for the query
    )
    .with_limit(5)
    .do()
)

# Print results
result = response["data"]["Get"]["Animals"]
json_print(result)

# Display the first image
display_media(result[0])

### Video to Media search

In [82]:
# Search for images that are similar to the provided image of test-meerkat, test-dog, test-cat
response = (
    client.query
    .get("Animals", "name path mediaType")
    .with_near_video(
        # {"video": "./test/test-dog.mp4"}, # Use file path as the input for the query
        # {"video": "./test/test-cat.mp4"}, # Use file path as the input for the query
        {"video": "./test/test-meerkat.mp4"}, # Use file path as the input for the query
    )
    .with_limit(3)
    .do()
)

# Print results
result = response["data"]["Get"]["Animals"]
json_print(result)

# Display the first image
display_media(result[0])

[
  {
    "mediaType": "video",
    "name": "meerkat-watch.mp4",
    "path": "./source/video/meerkat-watch.mp4"
  },
  {
    "mediaType": "video",
    "name": "meerkat-dig.mp4",
    "path": "./source/video/meerkat-dig.mp4"
  },
  {
    "mediaType": "image",
    "name": "meerkat3.jpg",
    "path": "./source/image/meerkat3.jpg"
  }
]
