Skip to content

kjaymiller/postgresql-clip-ovhcloud

 
 

Repository files navigation

Image Search Application with OVHcloud Managed Databases and OpenAI CLIP

Introduction

Welcome to the Image Search Application! This project demonstrates how to leverage OVHcloud's Managed PostgreSQL Databases alongside OpenAI's CLIP model to create an image search application.

Diagram of the demo workflow

System Architecture

The architecture consists of several key components:

  • OVHcloud Managed PostgreSQL Database: Stores image embeddings.
  • OVHcloud Object Storage: Holds the actual images. (Provided For You)
  • OVHcloud Compute Instances: Hosts the application, utilizing the powerful OpenAI CLIP model for processing.

Scripts Breakdown

  • 01_create_table.py: Prepares the PostgreSQL database by enabling necessary extensions and creating tables if they do not exist.

  • 02_process_images.py: Fetches images from OVHcloud Object Storage, computes their embeddings using the CLIP model, and stores them in the database.

  • 03_db_properties.py: Verify via the command line that search is working as expected.

  • 04_find_images.py: Generates an embedding from user input text and queries the database for matches, making the application interactive.

  • app.py: Offers a user-friendly interface where users can enter search queries and view results seamlessly.

Running the Demo

Create OVHcloud services

First, creating OVHcloud compute instance & Managed PostgreSQL Database. You can skip these steps if you already have these services available.

  1. Create PostgreSQL database from OVHcloud control panel. Follow the database creation guide and PostgreSQL configuration guide

  2. Create OVHcloud public cloud compute instance (e.g. b3-8 located in Virginia with Ubuntu 24.04 image) connected to the public network. Here are some useful documentations: Guide - compute instance creation and Guide - SSH key creation

Note

Note down the IP address of the instance. You will need it for adding to the list of authorized IPs for the database.

  1. Save the PostgreSQL URI from OVHcloud control panel with the correct username and password. We will use this URI to connect with the Database in scripts.

Note

There will already be a default user that you can find under the Users tab of database details. You can reset password of the user by clicking on the three dots. Embed the username and password in your URI and note it down. We will need it to connect with the database.

  1. From OVHcloud control panel, add the IP address of the compute instance obtained from step 2 to "Authorised IPs" for PostgreSQL database.

Setup Environment

  1. Connect to the compute instance via ssh and check if python is installed (install python if not installed)
sudo apt update && sudo apt install python3 -y && sudo apt install python3-venv -y
  1. Clone the GitHub repository containing these scripts to compute instance. cd to the directory containing scripts.
git clone --depth 1 https://github.com/pisymbol314/postgresql-clip.git
cd postgresql-clip
  1. Create a virtual environment to keep package installation local to this directory
python3 -m venv venv
  1. Enable virtual environment.
source venv/bin/activate
  1. Install Python packages we need
TMPDIR=/home/ubuntu python3 -m pip install -r requirements.txt --no-cache-dir

Note

Change TMPDIR to a different directory with enough size if you are using any other OS. Sometimes we've seen the Python clip.load function fail to download the CLIP model, presumably due to the source server being busy. The code here will use a local copy of the model if it's available. To make that local copy:

mkdir models

curl <https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt> --output models/ViT-B-32.pt
  1. Copy the template environment file, and then edit the .env file to insert the credentials (URI) needed to connect to the database. You will also need to enter the S3 object storage access key and secret.

Note

Inserting the database credentials, object storage access key and secret are mandatory for this demo. Database credentials (URI) were obtained from step 3 and need to be assigned to variable PG_SERVICE_URI. Contact OVHcloud US team for Object storage access key and secret. These will need to be assigned to variables S3_ACCESS_KEY and S3_SECRET_KEY respectively.

Keys were provided for workshop purposes and will not be usable after 23 Jan 2025.

Do not add your keys into a forked or cloned repo.

cp .env_example .env

Run Workshop Scripts

  1. Enable pgvector and set up the table we need in the database
./01_create_table.py
  1. Calculate the embeddings for the pictures in the OVHcloud object storage and upload them to the database
./02_process_images.py
  1. If you want to see clip process before starting your web app.
./03_db_properties.py
  1. You can run find_images.py to check that everything is working - it looks for images matching the text man jumping and reports their filenames
./04_find_images.py
  1. Start the webapp and access remotely using uvicorn server
uvicorn app:app --host 0.0.0.0 --port 8000 --reload

Go to http://<COMPUTE_INSTANCE_IP>:8000 in a web browser and request a search.

Possible ideas include:

  • cat
  • man jumping
  • outer space

Clean Up Resources

Important

Remember to delete services once done with the demo to avoid recurring charges.

The photos

Images are from Unsplash and are available for use under the Unsplash License. Images have been reduced in size.

Inspirations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 74.8%
  • HTML 25.2%