# Image recognition with Python, OpenCV, OpenAI CLIP model and PostgreSQL `pgvector`

This repository contains the working code for the example in the [blog post](https://aiven.io/developer/find-faces-with-pgvector)

The below is the overall flow:

![Overall flow](https://github.com/Aiven-Labs/pgvector-image-recognition/blob/main/entire_flow.jpg?raw=1)

## Step 0: Install requirements

In [1]:
#!pip install opencv-python imgbeddings psycopg2-binary

^C


Collecting opencv-python
  Using cached opencv_python-4.9.0.80-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting imgbeddings
  Using cached imgbeddings-0.1.0.tar.gz (8.7 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.9-cp312-cp312-win_amd64.whl.metadata (4.6 kB)
Collecting transformers>=4.17.0 (from imgbeddings)
  Using cached transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
Collecting onnxruntime>=1.10.0 (from imgbeddings)
  Downloading onnxruntime-1.17.1-cp312-cp312-win_amd64.whl.metadata (4.4 kB)
Collecting tqdm (from imgbeddings)
  Using cached tqdm-4.66.2-py3-none-any.whl.metadata (57 kB)
Collecting scikit-learn (from imgbeddings)
  Downloading scikit_learn-1.4.1.post1-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting coloredlogs (from onnxruntime>=1.10.0->imgbeddings)
  Using cached coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting 

## Step 1: Face recognition

Detect the faces from the [test-image](test-image.png) picture and store them under the `stored-faces` folder

In [6]:
# importing the cv2 library
import cv2
import numpy as np
from imgbeddings import imgbeddings
from PIL import Image
import psycopg2
import os

# loading the haar case algorithm file into alg variable
alg = "haarcascade_frontalface_default.xml"
# passing the algorithm to OpenCV
haar_cascade = cv2.CascadeClassifier(alg)
# loading the image path into file_name variable - replace <INSERT YOUR IMAGE NAME HERE> with the path to your image
i = 0
name_list = []
crop_list = []
for filename in os.listdir('storage'):
    file_name = "storage/" + filename
    # reading the image
    img = cv2.imread(file_name, 0)
    # creating a black and white version of the image
    gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    # detecting the faces
    faces = haar_cascade.detectMultiScale(
        gray_img, scaleFactor=1.05, minNeighbors=2, minSize=(100, 100)
    )

# for each face detected
    for x, y, w, h in faces:
        # crop the image to select only the face
        cropped_image = img[y : y + h, x : x + w]
        # loading the target image path into target_file_name variable  - replace <INSERT YOUR TARGET IMAGE NAME HERE> with the path to your target image
        target_file_name = 'stored-faces/' + str(i) + '.jpg'
        name_list.append(target_file_name)
        crop_list.append(cropped_image)
        i = i + 1;

for i in range(0, len(crop_list)):
  cv2.imwrite(
      name_list[i],
      crop_list[i]
      )


## Step 2: Embeddings Calculation

Calculate embeddings from the faces and pushing to PostgreSQL, you'll need to change the `<SERVICE_URI>` parameter with the PostgreSQL Service URI

In [None]:
# importing the required libraries
import numpy as np
from imgbeddings import imgbeddings
from PIL import Image
import psycopg2
import os

# connecting to the database - replace the SERVICE URI with the service URI
conn = psycopg2.connect("postgres://avnadmin:AVNS_BKsMedFi332hzBRFxnp@pg-6c616e4-phamleminhvu2004-51b6.a.aivencloud.com:21715/defaultdb?sslmode=require")

for filename in os.listdir("stored-faces"):
    # opening the image
    img = Image.open("stored-faces/" + filename)
    # loading the `imgbeddings`
    ibed = imgbeddings()
    # calculating the embeddings
    embedding = ibed.to_embeddings(img)
    cur = conn.cursor()
    cur.execute("INSERT INTO pictures values (%s,%s)", (filename, embedding[0].tolist()))
    print(filename)
conn.commit()

## Step 3: Calculate embeddings on a new picture

Find the face and calculate the embeddings on the picture `solo-image.png` used for research

In [None]:
# loading the face image path into file_name variable
file_name = "solo-image.png"  # replace <INSERT YOUR FACE FILE NAME> with the path to your image
# opening the image
img = Image.open(file_name)
# loading the `imgbeddings`
ibed = imgbeddings()
# calculating the embeddings
embedding = ibed.to_embeddings(img)

## Step 3: Find similar images by querying the Postgresql database using pgvector

In [None]:
from IPython.display import Image, display

cur = conn.cursor()
string_representation = "["+ ",".join(str(x) for x in embedding[0].tolist()) +"]"
cur.execute("SELECT * FROM pictures ORDER BY embedding <-> %s LIMIT 1;", (string_representation,))
rows = cur.fetchall()
for row in rows:
    display(Image(filename="stored-faces/"+row[0]))
cur.close()