# **Product Recommended system**

### Task 1 - Set up project environment

Installing the needed modules

In [None]:
!pip install openai==1.16.2 python-dotenv

Importing the needed modules and setup the OpenAI API

In [None]:
import pandas as pd
import numpy as np
import os
from openai import OpenAI
from dotenv import load_dotenv
from matplotlib import pyplot as plt
import plotly.express as px

from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import cosine_similarity

In [None]:


# Loading API key and organization ID from a dotenv file
load_dotenv(dotenv_path='apikey.env.txt')

# Retrieving API key and organization ID from environment variables
APIKEY = os.getenv("APIKEY")
ORGID = os.getenv("ORGID")

# Creating an instance of the OpenAI client with the provided API key and organization ID
client = OpenAI(
  organization= ORGID,
  api_key=APIKEY
)

client

Import our dataset

In [None]:
data=pd.read_csv('products_dataset.csv')
data

List of last 8 products recently viewed by the user.

In [None]:
searched_products_id = [
    'P1938',
    'P1970',
    'P1044',
    'P1838',
    'P1048',
    'P1017',
    'P1310',
    'P1444',
]

### Task 2 - Prepare the dataset

Let's label the data points that are recently veiwed.

In [None]:
data['product_status']="not_viewed"
data.loc[data.product_id.isin(searched_products_id),"product_status" ]="recently_viewed"
data[data.product_status=="recently_viewed"]

Now let's combine the product `title` and `description` and store it into a column called `combined`.

In [None]:
data['combined']=data.title + data.description
data

### Task 3 - Text embedding and visualization


Creating the text embedding vectors

In [None]:
response = client.embeddings.create(
    input=data.combined.to_list(),
    model="text-embedding-3-small",
    dimensions=512
)
vectors=[d.embedding for d in response.data]
data['text_embedddings']=vectors
data

> We know that each vector has 512 dimensions. In order to be able to visualize the vectors in a scatter plot, we need to use Principal Component Analysis (PCA) to reduce the dimension from 512 to 2.

In [None]:
pca=PCA(2)
vector_2d=pca.fit_transform(data.text_embeddings.to_list())
data['pca1']=vector_2d[:,0]
data['pca2']=vector_2d[:,1]
data

Now that we have the text embedding vectors in two dimensions, we can use them to create a 2D plot.

In [None]:
px.scatter(data,x='pca1', y='pca2', color='product_status')

### Task 4 - Find similar products

Get the data related to `recently_viewed` and `not_viewed` products

In [None]:
df_recently_viewed=data[data.product_status=='recently_viewed']
df_not_viewed=data[data.product_status=='not_viewed']

Convert the embedding vectors to Numpy arrays

In [None]:
vectors_recently_viewed=[np.array(vector) for vector in df_recently_viewed.text_embeddings]
vectors_not_viewed=[np.array(vector) for vector in df_not_viewed.text_embeddings]
vectors_recently_viewed

Find the similarity between each viewed product and all the unviewed products.

In [None]:

similarity_matrix=cosine_similarity(vectors_recently_viewed,vectors_not_viewed)
top_ids=[]
for row in similarity_matrix:
  top_id=np.argmax(row)
  top_ids.append(top_id)

most_similar_product_ids=list(df_not_viewed.iloc[top_ids].product_ids)
most_similar_product_ids

### Task 5 - Recommend products based on the searched products

Let's update the status of the top similar products to `recommended`.

In [None]:
data.loc[data.product_id.isin(most_similar_product_ids),"product_status"]="recommended"
data[data.product_status=="recommended"]

Let's visualize the recommended products.

In [None]:
px.scatter(data,x='pca1',y='pca2',color='product_status',hover_data='title')