Welcome to the Image Search Application! This project demonstrates how to leverage OVHcloud's Managed PostgreSQL Databases alongside OpenAI's CLIP model to create an image search application.
The architecture consists of several key components:
- OVHcloud Managed PostgreSQL Database: Stores image embeddings.
- OVHcloud Object Storage: Holds the actual images. (Provided For You)
- OVHcloud Compute Instances: Hosts the application, utilizing the powerful OpenAI CLIP model for processing.
-
01_create_table.py
: Prepares the PostgreSQL database by enabling necessary extensions and creating tables if they do not exist. -
02_process_images.py
: Fetches images from OVHcloud Object Storage, computes their embeddings using the CLIP model, and stores them in the database. -
03_db_properties.py
: Verify via the command line that search is working as expected. -
04_find_images.py
: Generates an embedding from user input text and queries the database for matches, making the application interactive. -
app.py
: Offers a user-friendly interface where users can enter search queries and view results seamlessly.
First, creating OVHcloud compute instance & Managed PostgreSQL Database. You can skip these steps if you already have these services available.
-
Create PostgreSQL database from OVHcloud control panel. Follow the database creation guide and PostgreSQL configuration guide
-
Create OVHcloud public cloud compute instance (e.g. b3-8 located in Virginia with Ubuntu 24.04 image) connected to the public network. Here are some useful documentations: Guide - compute instance creation and Guide - SSH key creation
Note
Note down the IP address of the instance. You will need it for adding to the list of authorized IPs for the database.
- Save the PostgreSQL URI from OVHcloud control panel with the correct username and password. We will use this URI to connect with the Database in scripts.
Note
There will already be a default user that you can find under the Users tab of database details. You can reset password of the user by clicking on the three dots. Embed the username and password in your URI and note it down. We will need it to connect with the database.
- From OVHcloud control panel, add the IP address of the compute instance obtained from step 2 to "Authorised IPs" for PostgreSQL database.
- Connect to the compute instance via ssh and check if python is installed (install python if not installed)
sudo apt update && sudo apt install python3 -y && sudo apt install python3-venv -y
- Clone the GitHub repository containing these scripts to compute instance. cd to the directory containing scripts.
git clone --depth 1 https://github.com/pisymbol314/postgresql-clip.git
cd postgresql-clip
- Create a virtual environment to keep package installation local to this directory
python3 -m venv venv
- Enable virtual environment.
source venv/bin/activate
- Install Python packages we need
TMPDIR=/home/ubuntu python3 -m pip install -r requirements.txt --no-cache-dir
Note
Change TMPDIR to a different directory with enough size if you are using any other OS.
Sometimes we've seen the Python clip.load
function fail to download the CLIP model, presumably due to the source server being busy. The code here will use a local copy of the model if it's available. To make that local copy:
mkdir models
curl <https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt> --output models/ViT-B-32.pt
- Copy the template environment file, and then edit the
.env
file to insert the credentials (URI) needed to connect to the database. You will also need to enter the S3 object storage access key and secret.
Note
Inserting the database credentials, object storage access key and secret are mandatory for this demo. Database credentials (URI) were obtained from step 3 and need to be assigned to variable PG_SERVICE_URI. Contact OVHcloud US team for Object storage access key and secret. These will need to be assigned to variables S3_ACCESS_KEY and S3_SECRET_KEY respectively.
Keys were provided for workshop purposes and will not be usable after 23 Jan 2025.
Do not add your keys into a forked or cloned repo.
cp .env_example .env
- Enable pgvector and set up the table we need in the database
./01_create_table.py
- Calculate the embeddings for the pictures in the OVHcloud object storage and upload them to the database
./02_process_images.py
- If you want to see clip process before starting your web app.
./03_db_properties.py
- You can run
find_images.py
to check that everything is working - it looks for images matching the textman jumping
and reports their filenames
./04_find_images.py
- Start the webapp and access remotely using uvicorn server
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
Go to http://<COMPUTE_INSTANCE_IP>:8000 in a web browser and request a search.
Possible ideas include:
- cat
- man jumping
- outer space
Important
Remember to delete services once done with the demo to avoid recurring charges.
Images are from Unsplash and are available for use under the Unsplash License. Images have been reduced in size.
-
The Workshop: Searching for images with vector search - OpenSearch and CLIP model which does (essentially) the same thing, but using OpenSearch and Jupyter notebooks.
-
Building a movie recommendation system with Tensorflow and PGVector which searches text, and produces a web app using JavaScript