# How to use Bedrock embedding to recommend Ads with content/blog/article - Building with Bedrock Embeddings

In this demo notebook, we demonstrate how to use the Bedrock Python SDK for Embeddings Generation.

1. [Set Up](#1.-Set-Up)
2. [Embeddings Generation](#2.-Embeddings-Generation)
3. [Items Similarity](#3.-Items-Similarity)

Note: This notebook was tested in Amazon SageMaker Studio with Python 3 (Data Science 3.0) kernel.

### 1. Set Up

---
Before executing the notebook for the first time, execute this cell to add bedrock extensions to the Python boto3 SDK

---

In [None]:
%pip install --upgrade pip
%pip install boto3 --upgrade
%pip install botocore --upgrade

Let's initialize the boto3 client to use Bedrock

In [5]:
import boto3
import botocore

# Get the Boto3 version
boto3_version = boto3.__version__

# Get the Botocore version
botocore_version = botocore.__version__


# Print the Boto3 version
print("Current Boto3 Version:", boto3_version)

# Print the Botocore version
print("Current Botocore Version:", botocore_version)

Current Boto3 Version: 1.28.67
Current Botocore Version: 1.31.67


Lets test the endpoint to see what models are available

In [3]:
import boto3
import botocore
import json
bedrock = boto3.client(
 service_name='bedrock',
 region_name='us-east-1',
 endpoint_url='https://bedrock.us-east-1.amazonaws.com'
)

In [4]:
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': '3b4ae905-a42e-46df-83b4-b29fb1b53927',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Fri, 20 Oct 2023 11:16:23 GMT',
   'content-type': 'application/json',
   'content-length': '6069',
   'connection': 'keep-alive',
   'x-amzn-requestid': '3b4ae905-a42e-46df-83b4-b29fb1b53927'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': ['FINE_TUNING'],
   'inferenceTypesSupported': ['ON_DEMAND']},
  {'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-e1t-medium',
   'modelId': 'amazon.titan-e1t-medium',
   'modelName': 'Titan Text Embeddings',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities'

### 2. Embeddings Generation

Embeddings are a key concept in generative AI and machine learning in general. An embedding is a representation of an object (like a word, image, video, etc.) in a vector space. Typically, semantically similar objects will have embeddings that are close together in the vector space. These are very powerful for use-cases like semantic search, recommendations and Classifications.

# We will be using the Titan Embeddings Model to generate our Embeddings.

In [8]:
bedrock_client = boto3.client('bedrock-runtime')
#article embedding, the content is a digital magazine 

import json

def get_embedding(body, modelId, accept, contentType):
    response = bedrock_client.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    embedding = response_body.get('embedding')
    return embedding

modelId = 'amazon.titan-embed-g1-text-02'
accept = 'application/json'
contentType = 'application/json'

In [9]:
# Fetch the article text and store it in the 'article_text' variable

article_text = """
JAKARTA, KOMPAS.com - Hasil jajak pendapat Lembaga Survei Indonesia (LSI) memperlihatkan, elektabilitas Prabowo Subianto unggul dibandingkan dengan Ganjar Pranowo dalam simulasi survei elektabilitas kandidat calon presiden (capres) Pemilu 2024 secara head to head atau berhadapan. Bakal capres Koalisi Indonesia Maju itu mengantongi elektabilitas 49,2 persen, terpaut 11,4 persen dibandingkan elektabilitas Ganjar yang berada di angka 37,8 persen. Ketika dihadapkan dengan kandidat bakal capres Koalisi Perubahan untuk Persatuan, Anies Baswedan, Prabowo lagi-lagi unggul dengan elektabilitas 52,6 persen. Sementara, angka elektoral Anies terpaut 19,6 persen di bawah Prabowo yakni 33,0 persen. Namun, meski kalah dari Prabowo, elektabilitas Ganjar masih unggul atas Anies. Dalam simulasi head to head, bakal capres PDI Perjuangan tersebut mencatatkan elektabilitas 47,1 persen, sedangkan Anies 37,5 persen. Baca juga: Cerita Mahfud Saat Diajak Jadi Cawapres Anies dan Prabowo Berikut simulasi head to head antara tiga bakal capres Pemilu Presiden (Pilpres) 2024 menurut survei LSI: Prabowo Subianto Vs Ganjar Pranowo Prabowo Januari 2021: 50,4 persen Mei-Juni 2022: 44,3 persen April 2023: 49,2 persen Oktober 2023: 49,2 persen Ganjar Januari 2021: 32,0 persen Mei-Juni 2022: 39,9 persen April 2023: 39,7 persen Oktober 2023: 37,8 persen Prabowo Subianto Vs Anies Baswedan Prabowo Januari 2021: 43,4 persen Mei-Juni 2022: 44,2 persen April 2023: 51,7 persen Oktober 2023: 52,6 persen Anies Januari 2021: 36,7 persen Mei-Juni 2022: 37,7 persen April 2023: 35,8 persen Oktober 2023: 33,0 persen Ganjar Pranowo Vs Anies Baswedan Ganjar Januari 2021: 34,1 persen Mei-Juni 2022: 41,5 persen April 2023: 46,7 persen Oktober 2023: 47,1 persen Anies Januari 2021: 44,7 persen Mei-Juni 2022: 40,5 persen April 2023: 39,2 persen Oktober 2023: 37,5 persen
 Artikel ini telah tayang di Kompas.com dengan judul "Survei LSI: Prabowo Menang "Head to Head" Lawan Ganjar dan Anies", Klik untuk baca: https://nasional.kompas.com/read/2023/10/20/11390371/survei-lsi-prabowo-menang-head-to-head-lawan-ganjar-dan-anies.
 Kompascom+ baca berita tanpa iklan: https://kmp.im/plus6
Download aplikasi: https://kmp.im/app6
"""

body = json.dumps({"inputText": article_text})

embedding_article = get_embedding(body, modelId, accept, contentType)
print(embedding_article)


[0.024377882, -0.25759548, -0.07961697, 0.045563877, 0.006130643, -0.51634836, 0.28105107, 2.6437972e-05, -0.0790654, -0.14275897, 0.011971932, 0.07021756, 0.20090061, 0.11266638, -0.5691551, -0.41847512, 0.31025752, 0.17111544, -0.11425781, -0.04401087, 0.25741464, -0.22645399, 0.42824075, 0.40606916, 0.025571473, 0.12832755, 0.5240162, -0.5504919, -0.10411242, 0.02777778, 0.2638889, 0.48509836, 0.37405962, 0.29144967, 0.5931713, 0.10760273, -0.53920716, -0.4037905, 0.19354022, -0.3408565, -0.32208478, -0.47670716, -0.15395327, 0.025824659, 0.30088976, 0.06278935, 0.34429252, 0.059461802, 0.36523438, -0.24269386, -0.16145833, 0.17332175, -0.38136572, -0.13570601, -0.13769531, -0.47634548, 0.21296296, -0.07501447, -0.29806858, -0.07741066, -0.089228876, 0.18684897, 0.09606482, 0.30743635, -0.0029387297, 0.47077549, -0.0055248123, -0.004421658, 0.21317998, 0.25614873, 0.27383536, -0.14946832, 0.1730324, 0.25321904, -0.0041775163, -0.13357204, -0.06734665, 0.0580693, 0.26931423, 0.583478

In [10]:
# ads1 embedding

ads1 = """
Telkomsel Ads : Suka nyanyi pakai Smule tapi lagunya terbatas & sering muncul iklan? Tenang, sekarang kamu bisa nikmatin jutaan lagu tanpa jeda iklan selama 1 bulan pakai paket MusicMAX Smule VIP hanya dengan Rp25.000! Yuk, aktifkan paketnya di aplikasi MyTelkomsel. #TerusBergerakMaju
"""

body = json.dumps({"inputText": ads1})

embedding_ads1 = get_embedding(body, modelId, accept, contentType)
print(embedding_ads1)

[-0.49804688, 0.16699219, 0.51171875, 0.3984375, 0.008422852, -0.16308594, 0.09033203, 0.00023651123, -0.041259766, 0.21972656, 0.296875, -0.23339844, 0.38085938, -0.14160156, -0.16894531, -0.45898438, -0.25390625, 0.20117188, -0.29296875, 0.0031280518, -0.18847656, -0.2265625, 0.088378906, -0.09423828, -0.012451172, 0.015991211, -0.17675781, 0.24023438, -0.10986328, -0.26757812, 0.00970459, 1.0078125, 0.6171875, 0.09033203, 0.34179688, 0.50390625, -0.20019531, -0.30859375, 0.45507812, 0.111816406, -0.54296875, 0.13574219, -0.140625, -0.095703125, 0.52734375, -0.27929688, 0.18554688, 0.18652344, 0.78125, 0.26757812, -0.1640625, -0.0015792847, 0.41601562, 0.23339844, -0.107910156, -0.060791016, 0.70703125, -0.27148438, 0.061523438, -0.15527344, 0.027709961, -0.55859375, 0.008728027, -0.37304688, -0.04638672, 0.38867188, -0.3203125, -0.20898438, -0.015014648, -0.31835938, 0.36328125, -0.35546875, -0.44140625, 0.22363281, -0.59375, -0.14746094, 0.5078125, 0.35351562, 0.46484375, 1.046875,

In [11]:
#ads2 embeddging

ads2 = """
First Media ads : Hi First People! Menyambut Hari Pelanggan Nasional, klaim bonus First Club berupa voucher Bebas Akses Semua TV Channel dan bonus First Privilege berupa voucher 1 Puyo Silky Dessert GRATIS di aplikasi My FirstMedia 1-7 September. Download aplikasi My FirstMedia sekarang!
"""

body = json.dumps({"inputText": ads2})

embedding_ads2 = get_embedding(body, modelId, accept, contentType)
print(embedding_ads2)

[0.017333984, 0.21484375, 0.27929688, 0.032470703, -0.0017776489, -0.10253906, 0.084472656, 0.00020503998, -0.14941406, 0.16113281, 0.578125, 0.11279297, 0.26757812, 0.20996094, 0.052246094, -0.4140625, -0.12402344, 0.42578125, 0.014099121, 0.35742188, -0.053955078, 0.17382812, -0.5234375, -0.234375, -0.045898438, -0.546875, -0.19238281, -0.005218506, 0.004638672, -0.26367188, -0.59765625, 0.3984375, 0.041992188, 0.17480469, -0.014831543, 0.041503906, -0.08154297, -0.6484375, 0.20800781, -0.27539062, 0.13671875, 0.07421875, -0.024047852, 0.10644531, 0.5078125, -0.25976562, 0.2421875, 0.068847656, 0.53125, 0.110839844, -0.076171875, 0.30664062, 0.6796875, -0.014221191, 0.11376953, -0.31835938, 0.5625, -0.03930664, -0.32617188, -0.30273438, 0.30078125, -0.31054688, -0.25195312, -0.18554688, 0.07763672, 0.32617188, -0.041015625, 0.08642578, -0.26171875, -0.01550293, 0.23242188, 0.017333984, -0.125, -0.111816406, 0.12890625, -0.13476562, 0.2734375, -0.25585938, 0.10253906, 0.55078125, -0.1

In [12]:
#ads3 embedding for all metadata
ads3 = """
BCA ads : Selamat sore,
@pelangganwarnet
. Terima kasih sudah berminat menjadi nasabah BCA. Informasi produk tabungan yang tersedia di BCA bisa cek link berikut: https://bca.co.id/id/individu/pr
"""

body = json.dumps({"inputText": ads3})

embedding_ads3 = get_embedding(body, modelId, accept, contentType)
print(embedding_ads3)




[-0.09326172, 0.3125, -0.140625, 0.12207031, -0.43359375, -0.18261719, -0.07128906, 7.677078e-05, 0.24121094, -0.022094727, 0.640625, 0.048339844, 0.22558594, 0.030273438, 0.068847656, -0.29492188, 0.068847656, 0.24804688, -0.27929688, 0.42382812, 0.12451172, -0.13085938, -0.47265625, 0.0012741089, 0.09716797, 0.08154297, 0.18457031, 0.27148438, -0.36328125, 0.046875, -0.0030517578, 0.32421875, 0.40625, 0.34375, -0.06933594, 0.19628906, 0.49804688, -0.5234375, 0.07470703, -0.53125, -0.6328125, 0.4296875, -0.028930664, 0.16210938, 0.47851562, -0.29882812, 0.19921875, 0.1796875, -0.30859375, 0.02734375, -0.22363281, 0.59765625, 0.54296875, -0.140625, 0.24023438, -0.04248047, 0.40429688, -0.115722656, -0.16894531, -0.15527344, 0.17773438, -0.15429688, -0.21386719, 0.66015625, -0.22265625, 0.20117188, -0.084472656, -0.18945312, -0.15429688, -0.578125, 0.08691406, -0.55859375, -0.13574219, -0.036376953, 0.28515625, 0.20410156, 0.20019531, 0.29492188, 0.32421875, -0.072753906, -0.01953125, 0

In [13]:
# ads4 embedding

ads4 = """
Suzuki ads : Desain maskulin dan tangguh XL7 cocok banget buat Suzuki Family yang suka eksplorasi. Kabin yang mewah dilengkapi dengan fitur modern bikin perjalanan kamu lebih memorable. #Suzuki #SuzukiIndonesia #YourGear #SekarangUntukNanti #AllNewErtiga
"""

body = json.dumps({"inputText": ads4})

embedding_ads4 = get_embedding(body, modelId, accept, contentType)
print(embedding_ads4)

[-0.040039062, 0.3515625, -0.022949219, 0.8203125, -0.7109375, -0.3671875, 0.24023438, 0.00018882751, 0.16992188, 0.11230469, 0.46484375, 0.041015625, 0.34960938, -0.24414062, -0.123046875, -0.22363281, 0.28320312, -0.111328125, -0.14453125, -0.056640625, -0.067871094, -0.38671875, -0.328125, -0.38085938, -0.19921875, 0.17773438, 0.57421875, 0.0859375, 0.016235352, -0.33203125, 0.36132812, 0.43945312, -0.03149414, 0.18164062, 0.17773438, 0.9609375, 0.36132812, -0.44140625, 0.421875, 0.010498047, 0.09716797, -0.18847656, -0.1640625, -0.38476562, 0.37890625, 0.075683594, 0.48242188, 0.051513672, 0.609375, 0.17382812, -0.60546875, 0.84765625, 0.13183594, -0.41992188, -0.048339844, -0.04345703, 0.5625, -0.51953125, -0.2109375, 0.0625, 0.115722656, 0.43945312, -0.40820312, 0.6484375, 0.10253906, 0.14550781, 0.328125, 0.45507812, 0.17089844, 0.0078125, -0.012268066, 0.33398438, -0.09716797, -0.63671875, -0.13574219, -0.37109375, 0.09326172, -0.10839844, 0.3046875, 0.80078125, -0.20996094, -0

In [14]:
# ads5 embedding

ads5 = """
Padma hotel Bandung ads : Padma Experience
Terinspirasi dari keindahan Bumi Parahyangan, Padma Hotel Bandung menawarkan pemandangan pegunungan yang spektakuler, pengalaman bersantap menyenangkan, beragam tipe kamar yang elegan, dan hospitality istimewa yang diwujudkan melalui Layanan Butler 24 jam kami. Terletak di lereng Bukit Ciumbuleuit serta tidak jauh dari keramaian pusat kota Bandung, hotel ini hanya berjarak 10 km dari Bandara Internasional Husein Sastranegara dan 9 km dari stasiun kereta api Bandung.
Padma Hotel Bandung adalah tempat sempurna untuk liburan mewah yang menyegarkan tubuh dan jiwa, private gathering, atau acara dan pertemuan yang diatur dengan cermat hingga detail terkecil.
"""

body = json.dumps({"inputText": ads5})

embedding_ads5 = get_embedding(body, modelId, accept, contentType)
print(embedding_ads5)

[-0.5546875, 0.31640625, 0.09423828, 0.546875, -0.375, -0.30273438, 0.14746094, 0.0003528595, -0.030761719, -0.5078125, 0.64453125, -0.59375, 0.22460938, 0.27929688, -0.15039062, -0.23925781, -0.10644531, -0.036621094, -0.20605469, -0.0074768066, 0.33789062, -0.24804688, -0.12011719, 0.24707031, -0.20410156, -0.22460938, 0.625, 0.8359375, -0.14550781, -0.087402344, 0.36914062, -0.21679688, 0.35351562, -0.37109375, 0.049560547, 0.114746094, 0.19433594, -0.18945312, 0.48046875, -1.03125, -0.9765625, 0.33984375, 0.671875, 0.69140625, 0.66796875, -0.18847656, 0.12597656, 0.23730469, 0.39257812, -0.25195312, 0.34570312, -0.50390625, 0.047607422, -0.12695312, 0.043945312, -0.46289062, -0.031982422, -0.14746094, -0.40234375, -0.08300781, -0.4140625, 0.18066406, -0.59375, 0.3359375, -0.119628906, -0.22851562, 0.109375, -0.27148438, 0.27148438, 0.19824219, -0.123535156, -0.26367188, -0.08105469, 0.11376953, -0.025634766, -0.012451172, 0.16601562, 0.18847656, 0.40234375, 0.46289062, 0.22558594, 

### 3.  Items Similarity

To calculate item-to-item similarity between the embedding_article and the different ad embeddings (embedding_ads1, embedding_ads2, embedding_ads3), you can use the Photon library. Here's an example of how you can apply the embeddings and calculate the cosine similarity using Photon:

In [15]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Example embedding vectors for items
item_embeddings = {
    'embedding_article': np.array(embedding_article),
    'embedding_ads1': np.array(embedding_ads1),
    'embedding_ads2': np.array(embedding_ads2),
    'embedding_ads3': np.array(embedding_ads3),
    'embedding_ads4': np.array(embedding_ads4),
    'embedding_ads5': np.array(embedding_ads5)
}

# Compute similarity matrix
similarity_matrix = cosine_similarity(list(item_embeddings.values()))

def get_similar_items(item_id, top_n=10):
    item_index = list(item_embeddings.keys()).index(item_id)
    item_scores = similarity_matrix[item_index]
    similar_indices = np.argsort(item_scores)[::-1][1:top_n+1]  # Exclude itself
    
    similar_items = [list(item_embeddings.keys())[i] for i in similar_indices]
    similar_scores = [item_scores[i] for i in similar_indices]
    return similar_items, similar_scores

# Example usage
similar_items, similarity_scores = get_similar_items('embedding_article', top_n=5)

for item, score in zip(similar_items, similarity_scores):
    print(f"Similar item: {item}, Similarity score: {score}")


Similar item: embedding_ads3, Similarity score: 0.3123745363629943
Similar item: embedding_ads1, Similarity score: 0.27998779731140633
Similar item: embedding_ads5, Similarity score: 0.24739610076413643
Similar item: embedding_ads2, Similarity score: 0.24457456821118864
Similar item: embedding_ads4, Similarity score: 0.23339093670841693
