# Personalized Product Recommendations
In this project, we aim to develop a product recommendation system using the **Bayesian Personalized Ranking (BPR)** method. Unlike traditional approaches, <br> which treat recommendation as a prediction problem, BPR optimizes for a personalized ranking of items for each user, effectively treating it as a ranking problem.

<br>
We've chosen to implement this using the LightFM model for several reasons:

1. **Efficiency with Large Datasets:** LightFM is well-suited to handle large-scale data, making it a good fit for our dataset.

2. **Incorporation of Additional Features:** LightFM can utilize additional features about items and users, allowing us to incorporate more information into our model and potentially improve the accuracy of our recommendations.

3. **Handling the Cold Start Problem:** LightFM is capable of making more accurate recommendations for users with little interaction history by leveraging user and item features, effectively addressing the 'cold start' problem common in recommendation systems.

By implementing BPR using LightFM, we hope to provide more accurate, personalized product recommendations based on users' past preferences and orders.

---

# 1. Data Preparation
First, we'll need a dataset that includes user IDs, product IDs, and some form of implicit feedback that indicates user preference for a product. In our case, we will be using purchase history as implicit feedback. This dataset will need to be transformed into a user-item interaction matrix where each entry represents the user's interaction with a specific product.

In [2]:
import numpy as np
from scipy.sparse import coo_matrix
from sklearn.preprocessing import LabelEncoder
from lightfm import LightFM
import pandas as pd

In [3]:
# Load the datasets
orders_df = pd.read_csv('dataset/csv/orders.csv')
order_products_prior_df = pd.read_csv('dataset/csv/order_products__prior.csv')
order_products_train_df = pd.read_csv('dataset/csv/order_products__train.csv')

In [4]:
# Merge the datasets
order_products_total_df = pd.concat([order_products_prior_df, order_products_train_df])
merged_df = pd.merge(order_products_total_df, orders_df, on='order_id', how='left')
merged_df.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
0,2,33120,1,1,202279,prior,3,5,9,8.0
1,2,28985,2,1,202279,prior,3,5,9,8.0
2,2,9327,3,0,202279,prior,3,5,9,8.0
3,2,45918,4,1,202279,prior,3,5,9,8.0
4,2,30035,5,0,202279,prior,3,5,9,8.0


In [5]:
# Create a user-item interaction DataFrame
user_item_interaction = merged_df.groupby(['user_id', 'product_id']).size().reset_index(name='counts')
user_item_interaction.head()

Unnamed: 0,user_id,product_id,counts
0,1,196,11
1,1,10258,10
2,1,10326,1
3,1,12427,10
4,1,13032,4


In [7]:
# Data preprocessing
user_enc = LabelEncoder()
user_item_interaction['user'] = user_enc.fit_transform(user_item_interaction['user_id'].values)
n_users = user_item_interaction['user'].nunique()

item_enc = LabelEncoder()
user_item_interaction['item'] = item_enc.fit_transform(user_item_interaction['product_id'].values)
n_items = user_item_interaction['item'].nunique()


In [8]:
# Create a user-item interaction matrix
interactions = coo_matrix((user_item_interaction['counts'].values, (user_item_interaction['user'].values, user_item_interaction['item'].values)), shape=(n_users, n_items))
interactions

<206209x49685 sparse matrix of type '<class 'numpy.int64'>'
	with 13863746 stored elements in COOrdinate format>

# 2. Bayesian Personalized Ranking (BPR)
Now we have our user-item interaction matrix, we'll implement Bayesian Personalized Ranking (BPR) using the LightFM model. BPR optimizes the model to provide a personalized ranking of items for each user. The model learns to rank items based on the observed user-item interactions. Additionally, LightFM allows us to incorporate item and user features to enhance the model's performance, especially in handling the 'cold start' problem.

In [9]:
# Train the BPR model

model = LightFM(no_components=30, loss='bpr')
model.fit(interactions, epochs=10)


: 

: 

# 3. Making Recommendations
After the model is trained, we can use it to make personalized product recommendations for each user. Given a user, the model will score each product based on the learned user-item interactions and return the top-ranked items as recommendations.

# 4. Evaluation
Finally, we'll evaluate the performance of our recommendation system. A common metric for this is precision at k, which measures the proportion of recommended items in the top-k that are relevant. We'll calculate this metric using the test set (which the model hasn't seen during training) to understand how well our model generalizes to unseen data.