# BIA 678 - Final Project

#### Part 4

Product Recommendations Assignment

For this assignment, you will analyze Instacart order data to uncover product associations that can inform recommendation systems to suggest complementary products to shoppers. 

Tasks:
1. Prepare the Instacart order data for association rule mining using an algorithm like Apriori. This involves aggregating orders per user, transforming orders into product sets, and filtering low occurrence products.

2. Apply association rule mining to identify product relationships with minimum support,minimum confidence and lift.  Analyze the rules to find the most relevant associations for recommendations.

3. Propose a recommendation system interface that suggests additional products to shoppers during checkout based on your association rule results. Prioritize rules with higher lift values. Visualize what this recommender experience could look like.

Additionally, select one or more of the following tasks:

4a. Smart Basket Recommendations: Design a smart basket system that tracks products shoppers add in real-time and provides live suggestions based on the association rules to prompt additional purchases. 

4b. Content-Based Recommendations: Build a content-based recommender that uses product descriptions and properties to match people to similar products. What product attributes are most meaningful to use for similarity?

4c. Collaborative Filtering Recommendations: Implement a collaborative filtering system that uses historical purchase data to identify shoppers with similar buying patterns and generates recommendations based on what similar shoppers purchased. 

Key Deliverables: 
- A 3-5 page project report documenting your analysis, association rules, recommendation system proposal, and selected extended task. 
- Code and output for the association rule mining 
- Mockups or diagrams for the proposed recommendation interfaces

The goal is to demonstrate you can analyze basket data, discover product relationships, and design compelling product recommendation experiences. What associations exist in grocery shopping data and how can retailers leverage recommendations to encourage larger purchases?

In [1]:
# Import packages
import numpy as np
import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules 

In [2]:
df_aisles = pd.read_csv('aisles.csv')
df_departments = pd.read_csv('departments.csv')
df_orders = pd.read_csv('orders.csv')
df_products = pd.read_csv('products.csv')
df_order_products = pd.read_csv('order_products__prior.csv')
df_order_products = df_order_products [:50000]

In [3]:
df1 = pd.merge(df_order_products, df_orders, on= 'order_id')
df1.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
0,2,33120,1,1,202279,prior,3,5,9,8.0
1,2,28985,2,1,202279,prior,3,5,9,8.0
2,2,9327,3,0,202279,prior,3,5,9,8.0
3,2,45918,4,1,202279,prior,3,5,9,8.0
4,2,30035,5,0,202279,prior,3,5,9,8.0


In [4]:
prod_aisles = pd.merge(df_products, df_aisles, on = 'aisle_id')
df2 = pd.merge(prod_aisles, df_departments, on = 'department_id')
df2.head

<bound method NDFrame.head of        product_id                                       product_name  \
0               1                         Chocolate Sandwich Cookies   
1              78                  Nutter Butter Cookie Bites Go-Pak   
2             102                              Danish Butter Cookies   
3             172     Gluten Free All Natural Chocolate Chip Cookies   
4             285                       Mini Nilla Wafers Munch Pack   
...           ...                                                ...   
49683       22827                         Organic Black Mission Figs   
49684       28655                         Crystallized Ginger Chunks   
49685       30365                                    Vegetable Chips   
49686       38007                     Naturally Sweet Plantain Chips   
49687       48778  Fit Super A Juice, Cold Pressed, Carrot/Apple/...   

       aisle_id  department_id                         aisle department  
0            61             19 

In [5]:
combined_df = pd.merge(df1, df2, on = 'product_id').reset_index(drop=True)
combined_df.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_name,aisle_id,department_id,aisle,department
0,2,33120,1,1,202279,prior,3,5,9,8.0,Organic Egg Whites,86,16,eggs,dairy eggs
1,26,33120,5,0,153404,prior,2,0,16,7.0,Organic Egg Whites,86,16,eggs,dairy eggs
2,120,33120,13,0,23750,prior,11,6,8,10.0,Organic Egg Whites,86,16,eggs,dairy eggs
3,327,33120,5,1,58707,prior,21,6,9,8.0,Organic Egg Whites,86,16,eggs,dairy eggs
4,390,33120,28,1,166654,prior,48,0,12,9.0,Organic Egg Whites,86,16,eggs,dairy eggs


In [6]:
df2 = combined_df.sample(n=1000)[['user_id','product_name']]
basket = pd.crosstab(df2['user_id'],df2['product_name']).astype('bool').astype('int')

In [7]:
#Checking and removing index.
basket=basket.reset_index(drop=True)
basket.index

RangeIndex(start=0, stop=869, step=1)

In [20]:
#Calling apriori algorithm on dummified data - basket.
frequent_itemsets=apriori(basket, min_support=0.00002, use_colnames=True).sort_values('support', ascending=False) 

frequent_itemsets.head(10)



Unnamed: 0,support,itemsets
48,0.021864,(Banana)
45,0.010357,(Bag of Organic Bananas)
422,0.008055,(Organic Hass Avocado)
470,0.006904,(Organic Raspberries)
39,0.005754,(Asparagus)
529,0.005754,(Organic Zucchini)
623,0.005754,(Russet Potato)
476,0.004603,(Organic Red Onion)
502,0.004603,(Organic Strawberries)
494,0.004603,(Organic Sour Cream)


In [21]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
rules.head(20)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(Large Brown Eggs, Honey Wheat Bread)",(Hamburger Buns),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
1,"(Large Brown Eggs, Hamburger Buns)",(Honey Wheat Bread),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
2,"(Honey Wheat Bread, Hamburger Buns)",(Large Brown Eggs),0.001151,0.002301,0.001151,1.0,434.5,0.001148,inf,0.998848
3,(Large Brown Eggs),"(Honey Wheat Bread, Hamburger Buns)",0.002301,0.001151,0.001151,0.5,434.5,0.001148,1.997699,1.0
4,(Honey Wheat Bread),"(Large Brown Eggs, Hamburger Buns)",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
5,(Hamburger Buns),"(Large Brown Eggs, Honey Wheat Bread)",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
6,"(Organic Light Agave Nectar, Sharp Cheddar Che...",(Newman O's Creme Filled Mint Chocolate Cookies),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
7,"(Organic Light Agave Nectar, Newman O's Creme ...",(Sharp Cheddar Cheese),0.001151,0.002301,0.001151,1.0,434.5,0.001148,inf,0.998848
8,(Newman O's Creme Filled Mint Chocolate Cookie...,(Organic Light Agave Nectar),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
9,(Organic Light Agave Nectar),(Newman O's Creme Filled Mint Chocolate Cookie...,0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0


In [22]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(Large Brown Eggs, Honey Wheat Bread)",(Hamburger Buns),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
1,"(Large Brown Eggs, Hamburger Buns)",(Honey Wheat Bread),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
2,"(Honey Wheat Bread, Hamburger Buns)",(Large Brown Eggs),0.001151,0.002301,0.001151,1.0,434.5,0.001148,inf,0.998848
3,(Large Brown Eggs),"(Honey Wheat Bread, Hamburger Buns)",0.002301,0.001151,0.001151,0.5,434.5,0.001148,1.997699,1.0
4,(Honey Wheat Bread),"(Large Brown Eggs, Hamburger Buns)",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
5,(Hamburger Buns),"(Large Brown Eggs, Honey Wheat Bread)",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
6,"(Organic Light Agave Nectar, Sharp Cheddar Che...",(Newman O's Creme Filled Mint Chocolate Cookies),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
7,"(Organic Light Agave Nectar, Newman O's Creme ...",(Sharp Cheddar Cheese),0.001151,0.002301,0.001151,1.0,434.5,0.001148,inf,0.998848
8,(Newman O's Creme Filled Mint Chocolate Cookie...,(Organic Light Agave Nectar),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0
9,(Organic Light Agave Nectar),(Newman O's Creme Filled Mint Chocolate Cookie...,0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.0


In [11]:
rules[(rules['lift'] >= 5) & (rules['confidence']>= 0.5)] 

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(Large Brown Eggs, Honey Wheat Bread)",(Hamburger Buns),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000
1,"(Large Brown Eggs, Hamburger Buns)",(Honey Wheat Bread),0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000
2,"(Honey Wheat Bread, Hamburger Buns)",(Large Brown Eggs),0.001151,0.002301,0.001151,1.0,434.5,0.001148,inf,0.998848
3,(Large Brown Eggs),"(Honey Wheat Bread, Hamburger Buns)",0.002301,0.001151,0.001151,0.5,434.5,0.001148,1.997699,1.000000
4,(Honey Wheat Bread),"(Large Brown Eggs, Hamburger Buns)",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000
...,...,...,...,...,...,...,...,...,...,...
437,"(Spanish Pitted Manzanilla Cocktail Olives, Or...","(Red Mango, Organic Diced Tomatoes)",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000
438,(Red Mango),"(Organic Diced Tomatoes, Spanish Pitted Manzan...",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000
439,(Organic Diced Tomatoes),"(Red Mango, Spanish Pitted Manzanilla Cocktail...",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000
440,(Spanish Pitted Manzanilla Cocktail Olives),"(Red Mango, Organic Skim Milk, Organic Diced T...",0.001151,0.001151,0.001151,1.0,869.0,0.001149,inf,1.000000


In [12]:
from sklearn.metrics.pairwise import cosine_similarity

combined_df.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_name,aisle_id,department_id,aisle,department
0,2,33120,1,1,202279,prior,3,5,9,8.0,Organic Egg Whites,86,16,eggs,dairy eggs
1,26,33120,5,0,153404,prior,2,0,16,7.0,Organic Egg Whites,86,16,eggs,dairy eggs
2,120,33120,13,0,23750,prior,11,6,8,10.0,Organic Egg Whites,86,16,eggs,dairy eggs
3,327,33120,5,1,58707,prior,21,6,9,8.0,Organic Egg Whites,86,16,eggs,dairy eggs
4,390,33120,28,1,166654,prior,48,0,12,9.0,Organic Egg Whites,86,16,eggs,dairy eggs


In [13]:
processed_df = combined_df.drop(columns=['eval_set', 'product_name', 'aisle', 'department'])
processed_df.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,user_id,order_number,order_dow,order_hour_of_day,days_since_prior_order,aisle_id,department_id
0,2,33120,1,1,202279,3,5,9,8.0,86,16
1,26,33120,5,0,153404,2,0,16,7.0,86,16
2,120,33120,13,0,23750,11,6,8,10.0,86,16
3,327,33120,5,1,58707,21,6,9,8.0,86,16
4,390,33120,28,1,166654,48,0,12,9.0,86,16


In [14]:
processed_df.shape

(50000, 11)

In [15]:
processed_df['days_since_prior_order'].fillna(processed_df['days_since_prior_order'].mean(), inplace=True)
print(processed_df.isna().sum())

order_id                  0
product_id                0
add_to_cart_order         0
reordered                 0
user_id                   0
order_number              0
order_dow                 0
order_hour_of_day         0
days_since_prior_order    0
aisle_id                  0
department_id             0
dtype: int64


In [16]:
user_item_matrix = processed_df.pivot_table(index='user_id', columns='product_id', values='add_to_cart_order', fill_value=0)

In [17]:
similarity_matrix = cosine_similarity(user_item_matrix)
similarity_matrix_df = pd.DataFrame(similarity_matrix, index=user_item_matrix.index, columns=user_item_matrix.index)

In [18]:
def recommend_products(user_id, similarity_matrix_df, user_item_matrix, top_n):
    similar_users = similarity_matrix_df[user_id].sort_values(ascending=False).index[1:]
    user_purchases = set(user_item_matrix.columns[user_item_matrix.loc[user_id] > 0])
    
    recommendations = []
    for similar_user in similar_users:
        similar_user_purchases = set(user_item_matrix.columns[user_item_matrix.loc[similar_user] > 0])
        recommended_products = similar_user_purchases - user_purchases
        recommendations.extend(list(recommended_products))
        if len(recommendations) >= top_n:
            break
            
    return recommendations[:top_n]

In [19]:
userId = 202279
top_n = 10

recommended_products = recommend_products(userId, similarity_matrix_df, user_item_matrix, top_n)
product_dict = pd.Series(df_products.product_name.values, index=df_products.product_id).to_dict()
product_names = list(map(product_dict.get, recommended_products))

print(f"Top {top_n} Product Recommendations for the user {userId}:")
print('-------------')
for i, name in enumerate(product_names):
    print(i, name)


Top 10 Product Recommendations for the user 202279:
-------------
0 Pure Sparkling Water
1 Half & Half
2 Freeze Dried Strawberry Slices
3 Double Chocolate Cake
4 Tiny Fruits Blueberry Apple
5 Organic Freeze Dried Strawberries
6 Organic Freeze-Dried Mango
7 Berry Medley
8 Organic Garlic
9 Organic Small Bunch Celery
