<h1> Similarity search using Amazon Bedrock Embedddings </h1>
<br>

<p> This notebook walks you through how you can perform similarity search with the help of vector embeddings stored in Amazon RDS using pgvector extension. Let's learn how to use Amazon Bedrock Embeddings to perform similarity search to retrieve similar products from the product catalog based on your search keyword.
    
<p> Before starting, please make sure this notebook is using <b>conda_python3</b> kernel from the top right! </p>

<p> To run this notebook, go to Cell -> Run All. Inspect the output of each cell block. </p>

<b> Please read the following instructions carefully! </b>
    <ul>
    <li>We highly recommend you to run all cells and inspect output rather than running the cells individually to save time as well as avoid any issues.  
    <li>This notebook is for your understanding. Running this notebook is NOT required for proceeding with the next steps of your workshop.
    <li>In case your notebook does not run as expected or if you run into any errors, please proceed with the next steps provided in the Workshop instructions. 
    <li>If you choose to run the notebooks, please read the comments in the markdown and inspect the output of each cell.
    </ul>

<h3> Install required dependencies </h3>
<p> <b>Note:</b> If you notice any ERRORs from the following cell, ignore them and proceed with the next cells.</p><br>

In [None]:
%pip install --quiet --no-build-isolation --upgrade \
    "boto3==1.28.63" \
    "awscli==1.29.63" \
    "botocore==1.31.63" \
    "langchain==0.0.309" \
    "psycopg2-binary==2.9.9" \
    "pgvector==0.2.3" \
    "numpy==1.26.1"

<h3> Import required packages </h3>

In [None]:
import json
import os
import sys
import boto3
import botocore
from langchain import PromptTemplate
from langchain.llms.bedrock import Bedrock

# For vector search
from langchain.embeddings import BedrockEmbeddings
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

# For image operations
from PIL import Image
import base64
import io
import requests

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

<h3> Initialize Bedrock client </h3>

In [None]:
boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

#### Initialize Amazon Bedrock Embeddings model 

We are using Titan Embeddings model here to convert string to vector embeddings to perform vector (similarity) search. 

In [None]:
modelId = "amazon.titan-embed-text-v1"
bedrock_embeddings = BedrockEmbeddings(model_id=modelId, client=boto3_bedrock)

#### Define a search keyword and create vector embedding for that keyword

In [None]:
keyword = "floral prints"
print(keyword)

#### Now let's create vector embedding for this keyword using Bedrock

In [None]:
search_embedding = list(bedrock_embeddings.embed_query(keyword))
print(search_embedding)

We are going to use the search embeddings to query the RDS vector knowledge base. This vector database is already prepopulated with embeddings for all the products in [this](https://github.com/zalandoresearch/feidegger/blob/master/data/FEIDEGGER_release_1.2.json) catalog. We used the same [FEIDEGGER](https://github.com/zalandoresearch/feidegger/tree/master) dataset to generate all the vector embeddings. 

Please note that in order to save time, all the 8500+ vector embeddings are pre-populated into your Amazon RDS database instance. The process to create vector embeddings for these many embeddings takes about ~20-30 minutes. In order to store and query these embeddings, your RDS database needs to have [pgvector](https://github.com/pgvector/pgvector) extension installed. It has also been pre-installed in your RDS instance.

#### Now lets connect to Amazon RDS and query the embeddings based on the search keyword 

In [None]:
# Initialize secrets manager
secrets = boto3.client('secretsmanager')

sm_response = secrets.get_secret_value(SecretId='postgresdb-secrets')

database_secrets = json.loads(sm_response['SecretString'])

dbhost = database_secrets['host']
dbport = database_secrets['port']
dbuser = database_secrets['username']
dbpass = database_secrets['password']
dbname = database_secrets['vectorDbIdentifier']

# Connect to the RDS vectordb database 
dbconn = psycopg2.connect(host=dbhost, user=dbuser, password=dbpass, port=dbport, database=dbname, connect_timeout=10)
dbconn.set_session(autocommit=True)
register_vector(dbconn)
cur = dbconn.cursor()

Execute search query where we perform similarity search on the pre-populated vector embeddings with the search keyword

In [None]:
# Limiting search result to 2 for now
cur.execute("""SELECT id, url, description, descriptions_embeddings 
                        FROM vector_products
                        ORDER BY descriptions_embeddings <-> %s limit 2;""", 
                        (np.array(search_embedding),))

# Fetch search result
dbresult = cur.fetchall()

#### Display similarity search result

This search result contains top 2 products that are similar to our search keyword. 

In [None]:
for x in dbresult:
    # Get similar product IDs
    product_item_id = x[0]
    
    # Get similar product descriptions
    desc = x[2]
    
    # Get image from URL
    url = x[1].split('?')[0]
    response = requests.get(url)
    img = Image.open(io.BytesIO(response.content))
    img = img.resize((256, 256))
    
    # Print similarity search results
    print("Product ID: " +str(product_item_id))
    print("\n"+desc)
    img.show()


<h3> You've successfully created a vector search application using Amazon RDS pgvector extension with Amazon Bedrock Embeddings!</h3>

<p> Please stop the notebook kernel before proceeding. </p>

<h4> Now, let's learn how to integrate Amazon Bedrock into your web application to do the same. Please go back to Workshop Studio and follow the instructions to replicate this code into your Cloud9 environment. </h4>