Install **kubectl** and the **Google Cloud SDK** with the necessary authentication plugin for Google Kubernetes Engine (GKE).

In [None]:
%%bash

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
apt-get update && apt-get install apt-transport-https ca-certificates gnupg
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
apt-get update && sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin

**Replace** \<CLUSTER_NAME> with your cluster name, e.g. pgvector-re-cluster. Retrieve the GKE cluster's credentials using the gcloud command.

In [None]:
%%bash

export KUBERNETES_CLUSTER_NAME=<CLUSTER_NAME>
gcloud container clusters get-credentials $KUBERNETES_CLUSTER_NAME --region $GOOGLE_CLOUD_REGION

Download the dataset from Git.

In [None]:
%%bash

export DATASET_PATH=https://raw.githubusercontent.com/epam/kubernetes-engine-samples/recommendation-engine/ai-ml/recommendation-engine/manifests/02-notebook/dataset.json
curl -s -LO $DATASET_PATH

Create an .env file with environment variables required for connecting to Postgresql in a Kubernetes cluster.

In [None]:
%%bash

echo POSTGRES_ENDPOINT=$(kubectl get pod -l cnpg.io/instanceRole=primary -n pg-ns -o=jsonpath="{.items[0].status.podIP}") > .env
echo DATABASE_NAME=app >> .env
echo DBUSERNAME=$(kubectl get secret gke-pg-cluster-superuser -n pg-ns --template={{.data.username}} | base64 -d) >> .env
echo DBPASSWORD=$(kubectl get secret gke-pg-cluster-superuser -n pg-ns --template={{.data.password}} | base64 -d) >> .env

Install required python libraries:

In [None]:
! pip install --upgrade-strategy only-if-needed python-dotenv psycopg-binary psycopg langchain langchain-postgres langchain-community langchain-google-vertexai jq

Import python libraries:

In [None]:
from dotenv import load_dotenv
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_google_vertexai import VertexAI
from langchain_postgres.vectorstores import PGVector
from langchain_community.document_loaders import JSONLoader
import os
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts import format_document

Load and parse the dataset, using the JSON loader and retrieving only specific fields: category, description, gender, brand and color.

In [None]:
def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["category"] = record.get("category")
    metadata["description"] = record.get("description")
    metadata["gender"] = record.get("gender")
    metadata["brand"] = record.get("brand")
    metadata["color"] = "".join( c for c in record.get("color") if c not in "[]'" )
    return metadata

loader = JSONLoader(
    file_path='/content/dataset.json',
    jq_schema='.[]',
    content_key='title',
    metadata_func=metadata_func)
data = loader.load()

Configure the prompt templates, first one to interact with Gemini, and second to transform jsons into strings. Define the function to merge multiple found items from the database into one multiline string.

In [None]:
llm_prompt_template = PromptTemplate.from_template("""
    You're a helpful assistant who can recommend things in addition to those already chosen.
    Already chosen item:
    {chosen_item}

    Available items:
    {available_items}

    Please check all available items and find the {max_recommendations} most suitable item or items for the chosen one.
    Try to ensure that the recommended item or items will match the brand, color and purpose well.
    Your answer should contain the chosen item, the recommended and the example how to use them all together.
    Generate a draft response using the selected information.
    It should be easy to understand your answer. Start your answer with the phrase: "For <chosen_item> I would recommend <recommended>:"
    Don't forget to mention all {max_recommendations} recommendations.
    Keep your answer to a four or five sentences if possible. If not - try to keep the answer short.
    Generate your final response after adjusting it to increase accuracy and relevance.
    Now only show your final response!""")

data_format_prompt_template = PromptTemplate.from_template("| {page_content} | Category: {category} | Color: {color} | Gender: {gender} | Brand: {brand} | Description: {description} |\n")

def format_data(documents):
    result=""
    for doc in documents:
        result += format_document(doc, data_format_prompt_template)
    return result

Declare two models from the Vertex AI model garden: Gecko (vector embedding model) and Gemini Flash (lightweight version of Gemini Pro).

In [None]:
embeddings = VertexAIEmbeddings("textembedding-gecko@latest")
llm = VertexAI(model_name="gemini-1.5-flash")

Load environment variables from the .env file, establish a connection to a PostgreSQL database and upload data from the dataset.

In [None]:
load_dotenv()

CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver="psycopg",
    host=os.environ.get("POSTGRES_ENDPOINT"),
    port=5432,
    database=os.environ.get("DATABASE_NAME"),
    user=os.environ.get("DBUSERNAME"),
    password=os.environ.get("DBPASSWORD"),
)
db = PGVector.from_documents(
    embedding=embeddings,
    documents=data,
    collection_name="products",
    connection=CONNECTION_STRING,
)

Set your preferences to use it in recommendation searches, or keep the variables empty to ignore them. For example, recommended things should be black and produced by Google if possible, but the gender is not specified and should be just the same as the original item.

In [None]:
preferred_color="black"
preferred_brand="google"
preferred_gender=""

Set the maximum recommended items.

In [None]:
max_recommendations=2

Define query generator and recommendation engine functions.

In [None]:
def get_query_string(original_item):
    search_query_string=original_item.metadata['description']
    search_query_string+= ", Color: " + (preferred_color if preferred_color != "" else original_item.metadata['color'])
    search_query_string+= ", Brand: " + (preferred_brand if preferred_brand != "" else original_item.metadata['brand'])
    search_query_string+= ", Gender: " + (preferred_gender if preferred_gender != "" else original_item.metadata['gender'])
    return search_query_string

def get_recommendation(original_item, max_recommendations):
    original_item_formatted=format_data([original_item])
    search_query_string=get_query_string(original_item)
    found_docs = db.similarity_search(
        search_query_string,
        k=max_recommendations*5,
        filter={"description": {"$ne": original_item.metadata['description']}}
    )
    found_docs_formatted=format_data(found_docs)
    llm_prompt = llm_prompt_template.format(chosen_item=original_item_formatted, available_items=found_docs_formatted, max_recommendations=max_recommendations)
    print(f"{original_item.page_content}:")
    output = llm.invoke(llm_prompt)
    print(output)

Take 5 products from the dataset and find recommended items for them.

In [None]:
for i in range(5):
    get_recommendation(data[i*15], max_recommendations)