# Intro to Recommender Systems Lab

Complete the exercises below to solidify your knowledge and understanding of recommender systems.

For this lab, we are going to be putting together a user similarity based recommender system in a step-by-step fashion. Our data set contains customer grocery purchases, and we will use similar purchase behavior to inform our recommender system. Our recommender system will generate 5 recommendations for each customer based on the purchases they have made.

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import numpy as np
import pandas as pd
from scipy.spatial.distance import pdist, squareform

In [2]:
data = pd.read_csv('../data/customer_product_sales.csv')

In [3]:
data.head()

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity
0,61288,Rosa,Andersen,134196,229,Bread - Hot Dog Buns,16
1,77352,Myron,Murray,6167892,229,Bread - Hot Dog Buns,20
2,40094,Susan,Stevenson,5970885,229,Bread - Hot Dog Buns,11
3,23548,Tricia,Vincent,6426954,229,Bread - Hot Dog Buns,6
4,78981,Scott,Burch,819094,229,Bread - Hot Dog Buns,20


## Step 1: Create a data frame that contains the total quantity of each product purchased by each customer.

You will need to group by CustomerID and ProductName and then sum the Quantity field.

In [6]:
df_products = data.groupby(['CustomerID', 'ProductName']).agg({'Quantity':'sum'})
df_products.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Quantity
CustomerID,ProductName,Unnamed: 2_level_1
33,Apricots - Dried,1
33,Assorted Desserts,1
33,Bandage - Flexible Neon,1
33,"Bar Mix - Pina Colada, 355 Ml",1
33,"Beans - Kidney, Canned",1


## Step 2: Use the `pivot_table` method to create a product by customer matrix.

The rows of the matrix should represent the products, the columns should represent the customers, and the values should be the quantities of each product purchased by each customer. You will also need to replace nulls with zeros, which you can do using the `fillna` method.

In [7]:
matrix_products = df_products.pivot_table('Quantity', index = 'ProductName', columns = 'CustomerID', fill_value = 0)
matrix_products.head()

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
ProductName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Anchovy Paste - 56 G Tube,0,0,0,0,0,0,0,1,0,0,...,0,25,0,0,0,0,0,0,0,0
"Appetizer - Mini Egg Roll, Shrimp",0,0,0,0,0,0,0,0,0,0,...,25,25,0,0,0,0,0,0,0,0
Appetizer - Mushroom Tart,0,0,0,0,0,0,0,1,0,0,...,25,0,0,0,0,0,0,0,25,0
Appetizer - Sausage Rolls,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,25,25,25,0,25,0
Apricots - Dried,1,0,0,0,1,0,0,0,0,0,...,0,25,0,0,0,0,0,0,0,0


## Step 3: Create a customer similarity matrix using `squareform` and `pdist`. For the distance metric, choose "euclidean."

In [10]:
cust_similarity_matrix = squareform(pdist(matrix_products.T, 'euclidean'))
cust_similarity_matrix

array([[  0.        ,  11.91637529,  10.48808848, ..., 228.62851966,
        239.        , 229.77380181],
       [ 11.91637529,   0.        ,  11.74734012, ..., 228.01096465,
        239.03765394, 229.70415756],
       [ 10.48808848,  11.74734012,   0.        , ..., 228.08112592,
        238.26665734, 229.77380181],
       ...,
       [228.62851966, 228.01096465, 228.08112592, ...,   0.        ,
        304.13812651, 305.16389039],
       [239.        , 239.03765394, 238.26665734, ..., 304.13812651,
          0.        , 303.10889132],
       [229.77380181, 229.70415756, 229.77380181, ..., 305.16389039,
        303.10889132,   0.        ]])

## Step 4: Check your results by generating a list of the top 5 most similar customers for a specific CustomerID.

In [11]:
customer_similarity = pd.DataFrame(1 / (1 + squareform(pdist(matrix_products.T, metric = 'euclidean'))), index = matrix_products.columns, columns = matrix_products.columns)
customer_similarity.head()

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.077421,0.087047,0.0818,0.080634,0.082709,0.074573,0.08302,0.081503,0.08007,...,0.004811,0.004669,0.004412,0.005019,0.004312,0.004515,0.004583,0.004355,0.004167,0.004333
200,0.077421,1.0,0.078448,0.076435,0.073693,0.075255,0.075956,0.076435,0.077674,0.076923,...,0.004824,0.004681,0.004431,0.005047,0.004311,0.004521,0.004614,0.004367,0.004166,0.004335
264,0.087047,0.078448,1.0,0.08007,0.0818,0.08035,0.076923,0.080634,0.0821,0.078448,...,0.004822,0.004674,0.004416,0.005035,0.004322,0.004543,0.004595,0.004365,0.004179,0.004333
356,0.0818,0.076435,0.08007,1.0,0.076435,0.078187,0.075025,0.082403,0.077171,0.075956,...,0.004816,0.004671,0.004416,0.005038,0.00431,0.004526,0.004578,0.004365,0.004175,0.004339
412,0.080634,0.073693,0.0818,0.076435,1.0,0.078711,0.075025,0.082403,0.078187,0.078448,...,0.00481,0.004702,0.004414,0.005034,0.004318,0.00453,0.004578,0.004367,0.004177,0.004349


In [17]:
top_5 = customer_similarity[98200].nlargest(6)[1:]
top_5

CustomerID
11883    0.004456
14782    0.004456
32324    0.004449
14208    0.004443
32180    0.004427
Name: 98200, dtype: float64

## Step 5: From the data frame you created in Step 1, select the records for the list of similar CustomerIDs you obtained in Step 4.

In [18]:
similar_customer = df_products.loc[list(top_5.index)]
similar_customer

Unnamed: 0_level_0,Unnamed: 1_level_0,Quantity
CustomerID,ProductName,Unnamed: 2_level_1
11883,Artichokes - Jerusalem,4
11883,Assorted Desserts,4
11883,Bandage - Fexible 1x3,4
11883,Bandage - Flexible Neon,4
11883,"Bar Mix - Pina Colada, 355 Ml",4
...,...,...
32180,"Wine - Cahors Ac 2000, Clos",9
32180,Wine - Chardonnay South,9
32180,Wine - Fume Blanc Fetzer,9
32180,Wine - Redchard Merritt,9


## Step 6: Aggregate those customer purchase records by ProductName, sum the Quantity field, and then rank them in descending order by quantity.

This will give you the total number of each product purchased by the 5 most similar customers to the customer you selected in order from most purchased to least.

In [23]:
products_list = similar_customer.groupby('ProductName').agg('sum').sort_values('Quantity', ascending = False)
products_list

Unnamed: 0_level_0,Quantity
ProductName,Unnamed: 1_level_1
Sea Bass - Whole,30
Beans - Wax,27
Apricots - Halves,27
Sherry - Dry,26
Pork - Hock And Feet Attached,22
...,...
General Purpose Trigger,4
Guinea Fowl,4
Halibut - Steaks,4
"Juice - Cranberry, 341 Ml",4


## Step 7: Filter the list for products that the chosen customer has not yet purchased and then recommend the top 5 products with the highest quantities that are left.

- Merge the ranked products data frame with the customer product matrix on the ProductName field.
- Filter for records where the chosen customer has not purchased the product.
- Show the top 5 results.

In [24]:
merged_ranked_products = matrix_products.merge(products_list, on = 'ProductName')
merged_ranked_products.head()

Unnamed: 0_level_0,33,200,264,356,412,464,477,639,649,669,...,97753,97769,97793,97900,97928,98069,98159,98185,98200,Quantity
ProductName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Apricots - Dried,1,0,0,0,1,0,0,0,0,0,...,25,0,0,0,0,0,0,0,0,4
Apricots - Halves,0,0,1,0,0,0,0,0,0,0,...,0,0,0,25,50,25,0,25,25,27
Apricots Fresh,0,0,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,25,9
Arizona - Green Tea,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,25,0,0,0,0,9
Artichokes - Jerusalem,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,4


In [27]:
recommended_products = merged_ranked_products[merged_ranked_products[98200] == 0].nlargest(5, columns = 'Quantity')['Quantity']
recommended_products.head()

ProductName
Beans - Wax                      27
Sherry - Dry                     26
Ecolab - Mikroklene 4/4 L        22
Pears - Bosc                     22
Pork - Hock And Feet Attached    22
Name: Quantity, dtype: int64

## Step 8: Now that we have generated product recommendations for a single user, put the pieces together and iterate over a list of all CustomerIDs.

- Create an empty dictionary that will hold the recommendations for all customers.
- Create a list of unique CustomerIDs to iterate over.
- Iterate over the customer list performing steps 4 through 7 for each and appending the results of each iteration to the dictionary you created.

In [39]:
recommendations_dict = {}
unique_cust_ids = df_products.reset_index()['CustomerID'].unique()
len(unique_cust_ids)

1000

In [45]:
for customer in unique_cust_ids:
    #Step 4: get top 5 similar customers to each customer
    top_5 = customer_similarity[customer].nlargest(6)[1:]
    
    #Step 5: get list of similar customers
    similar_customer = df_products.loc[list(top_5.index)]
    
    #Step 6: Aggregation of customer purchase records by product name
    products_list = similar_customer.groupby('ProductName').agg('sum').sort_values('Quantity', ascending = False)
    
    #Step 7: Recommend top 5 un-purchased products
    merged_ranked_products = matrix_products.merge(products_list, on = 'ProductName')
    
    #Add the recommendations to the Dictionary
    recommendations_dict[customer] = list(merged_ranked_products[merged_ranked_products[customer] == 0].nlargest(5, columns = 'Quantity')['Quantity'].index)

In [46]:
recommendations_dict

{33: ['Butter - Unsalted',
  'Soup - Campbells Bean Medley',
  'Wine - Blue Nun Qualitatswein',
  'Wine - Ej Gallo Sierra Valley',
  'Bacardi Breezer - Tropical'],
 200: ['Soup - Campbells Bean Medley',
  'Bay Leaf',
  'Muffin - Carrot Individual Wrap',
  'Pork - Kidney',
  'Wanton Wrap'],
 264: ['Bread - Italian Roll With Herbs',
  'Fish - Scallops, Cold Smoked',
  'Fondant - Icing',
  'Soupfoamcont12oz 112con',
  'Veal - Inside, Choice'],
 356: ['Butter - Unsalted',
  'Veal - Inside, Choice',
  'Beets - Candy Cane, Organic',
  'Nut - Chestnuts, Whole',
  'Beans - Wax'],
 412: ['Cheese - Cambozola',
  'Olive - Spread Tapenade',
  'Pepper - Black, Whole',
  'Sauce - Gravy, Au Jus, Mix',
  'Soup - Campbells Bean Medley'],
 464: ['Butter - Unsalted',
  'Bar - Granola Trail Mix Fruit Nut',
  'Peas - Pigeon, Dry',
  'Pepper - Black, Whole',
  'Sauce - Gravy, Au Jus, Mix'],
 477: ['Cheese - Cambozola',
  'Olive - Spread Tapenade',
  'Pepper - Black, Whole',
  'Sprouts - Baby Pea Tendrils',


##  Step 9: Store the results in a Pandas data frame. The data frame should a column for Customer ID and then a column for each of the 5 product recommendations for each customer.

In [47]:
df_recommendations = pd.DataFrame.from_dict(recommendations_dict, orient = 'columns')
df_recommendations.head()

Unnamed: 0,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
0,Butter - Unsalted,Soup - Campbells Bean Medley,Bread - Italian Roll With Herbs,Butter - Unsalted,Cheese - Cambozola,Butter - Unsalted,Cheese - Cambozola,Cod - Black Whole Fillet,Appetizer - Mushroom Tart,"Veal - Inside, Choice",...,Cheese - Wine,Muffin Batt - Choc Chk,"Pork - Loin, Bone - In",Fenngreek Seed,"Placemat - Scallop, White","Soup - Campbells, Lentil",Skirt - 29 Foot,"Wine - Red, Harrow Estates, Cab",Crackers - Trio,Beans - Wax
1,Soup - Campbells Bean Medley,Bay Leaf,"Fish - Scallops, Cold Smoked","Veal - Inside, Choice",Olive - Spread Tapenade,Bar - Granola Trail Mix Fruit Nut,Olive - Spread Tapenade,Wine - Ej Gallo Sierra Valley,Bar - Granola Trail Mix Fruit Nut,"Pepper - Black, Whole",...,"Lentils - Red, Dry",Papayas,Potatoes - Idaho 100 Count,Grenadine,Rum - Mount Gay Eclipes,"Cheese - Brie,danish",Beans - Kidney White,Halibut - Steaks,Pernod,Sherry - Dry
2,Wine - Blue Nun Qualitatswein,Muffin - Carrot Individual Wrap,Fondant - Icing,"Beets - Candy Cane, Organic","Pepper - Black, Whole","Peas - Pigeon, Dry","Pepper - Black, Whole","Pepper - Black, Whole",Butter - Unsalted,Sardines,...,Meldea Green Tea Liquor,Meldea Green Tea Liquor,Pants Custom Dry Clean,Halibut - Fletches,Tofu - Firm,Pants Custom Dry Clean,Cheese - Taleggio D.o.p.,Butter - Unsalted,Tea - Jasmin Green,Ecolab - Mikroklene 4/4 L
3,Wine - Ej Gallo Sierra Valley,Pork - Kidney,Soupfoamcont12oz 112con,"Nut - Chestnuts, Whole","Sauce - Gravy, Au Jus, Mix","Pepper - Black, Whole",Sprouts - Baby Pea Tendrils,Soupfoamcont12oz 112con,Lamb - Ground,Wine - Blue Nun Qualitatswein,...,Muffin - Carrot Individual Wrap,Mussels - Frozen,Cake - Box Window 10x10x2.5,Ice Cream Bar - Oreo Cone,"Cheese - Boursin, Garlic / Herbs",Wiberg Super Cure,Milk - 1%,Flour - Whole Wheat,Cream Of Tartar,Pears - Bosc
4,Bacardi Breezer - Tropical,Wanton Wrap,"Veal - Inside, Choice",Beans - Wax,Soup - Campbells Bean Medley,"Sauce - Gravy, Au Jus, Mix","Veal - Inside, Choice",Wine - Crozes Hermitage E.,"Peas - Pigeon, Dry",Wine - Ej Gallo Sierra Valley,...,Coffee - Dark Roast,Wine - Chablis 2003 Champs,Cod - Black Whole Fillet,"Wine - Red, Colio Cabernet",Sobe - Tropical Energy,Shrimp - 31/40,Sprouts - Baby Pea Tendrils,Ice Cream Bar - Oreo Cone,Lettuce - Spring Mix,Pork - Hock And Feet Attached


## Step 10: Change the distance metric used in Step 3 to something other than euclidean (correlation, cityblock, cosine, jaccard, etc.). Regenerate the recommendations for all customers and note the differences.

In [48]:
unique_cust_ids = df_products.reset_index()['CustomerID'].unique()

- Correlation

In [49]:
recommendations_dict = {}
customer_similarity = pd.DataFrame(1 / (1 + squareform(pdist(matrix_products.T, metric = 'correlation'))), index = matrix_products.columns, columns = matrix_products.columns)

for customer in unique_cust_ids:
    top_5 = customer_similarity[customer].nlargest(6)[1:]
    similar_customer = df_products.loc[list(top_5.index)]
    products_list = similar_customer.groupby('ProductName').agg('sum').sort_values('Quantity', ascending = False)
    merged_ranked_products = matrix_products.merge(products_list, on = 'ProductName')
    
    recommendations_dict[customer] = list(merged_ranked_products[merged_ranked_products[customer] == 0].nlargest(5, columns = 'Quantity')['Quantity'].index)
    
df_recommendations = pd.DataFrame.from_dict(recommendations_dict, orient = 'columns')
df_recommendations.head()

Unnamed: 0,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
0,Knife Plastic - White,Otomegusa Dashi Konbu,"Water - Mineral, Natural",Cheese - Taleggio D.o.p.,Butter - Unsalted,Bread - Bistro White,Olive - Spread Tapenade,Dried Figs,"Fish - Scallops, Cold Smoked",Cookies - Assorted,...,Vanilla Beans,Wine - Fume Blanc Fetzer,Spice - Peppercorn Melange,"Beef - Ground, Extra Lean, Fresh",Beef - Striploin Aa,Lettuce - Treviso,Veal - Inside,"Water, Tap",Cheese - Taleggio D.o.p.,Juice - Orange
1,Muffin - Zero Transfat,Crackers - Trio,Wine - Toasted Head,"Coconut - Shredded, Sweet",Cake - Mini Cheesecake,Knife Plastic - White,Apricots Fresh,Cake - Box Window 10x10x2.5,Napkin White - Starched,Wine - Blue Nun Qualitatswein,...,Ice Cream Bar - Hageen Daz To,Blackberries,French Pastry - Mini Chocolate,Tuna - Salad Premix,Zucchini - Yellow,"Yogurt - Blueberry, 175 Gr","Wine - Red, Colio Cabernet",Bananas,Squid U5 - Thailand,Pork - Hock And Feet Attached
2,Banana Turning,Milk Powder,Snapple - Iced Tea Peach,Cheese - Cheddarsliced,"Salmon - Atlantic, Skin On",Wonton Wrappers,Squid U5 - Thailand,Cassis,"Mushroom - Porcini, Dry",Pickerel - Fillets,...,Tomatoes Tear Drop,Coffee - Dark Roast,"Beef - Tenderlion, Center Cut",Pastry - Raisin Muffin - Mini,Langers - Ruby Red Grapfruit,Bread - Calabrese Baguette,Soupfoamcont12oz 112con,Wine - Redchard Merritt,"Peas - Pigeon, Dry",Longos - Grilled Chicken With
3,Crush - Cream Soda,Pail With Metal Handle 16l White,Garbag Bags - Black,Ocean Spray - Kiwi Strawberry,Wine - Hardys Bankside Shiraz,Broom - Corn,"Pasta - Penne, Rigate, Dry",Foam Dinner Plate,Pork - Inside,Beer - Rickards Red,...,Otomegusa Dashi Konbu,Hersey Shakes,Scallop - St. Jaques,Wine - Ruffino Chianti,Table Cloth 54x72 White,Extract - Lemon,Peas - Frozen,Chocolate - Compound Coating,Salmon Steak - Cohoe 8 Oz,Apricots - Dried
4,Veal - Osso Bucco,Potatoes - Idaho 100 Count,Hersey Shakes,Olives - Kalamata,"Beans - Kidney, Red Dry",Meldea Green Tea Liquor,"Cheese - Boursin, Garlic / Herbs","Rum - Coconut, Malibu",Wine - Pinot Noir Latour,"Hickory Smoke, Liquid",...,Cheese - Cottage Cheese,Potatoes - Idaho 100 Count,Water - Green Tea Refresher,Bananas,Beef - Rib Eye Aaa,Coffee - Irish Cream,Sprouts - Baby Pea Tendrils,Cumin - Whole,"Sole - Dover, Whole, Fresh",Campari


- Cityblock

In [50]:
recommendations_dict = {}
customer_similarity = pd.DataFrame(1 / (1 + squareform(pdist(matrix_products.T, metric = 'cityblock'))), index = matrix_products.columns, columns = matrix_products.columns)

for customer in unique_cust_ids:
    top_5 = customer_similarity[customer].nlargest(6)[1:]
    similar_customer = df_products.loc[list(top_5.index)]
    products_list = similar_customer.groupby('ProductName').agg('sum').sort_values('Quantity', ascending = False)
    merged_ranked_products = matrix_products.merge(products_list, on = 'ProductName')
    
    recommendations_dict[customer] = list(merged_ranked_products[merged_ranked_products[customer] == 0].nlargest(5, columns = 'Quantity')['Quantity'].index)
    
df_recommendations = pd.DataFrame.from_dict(recommendations_dict, orient = 'columns')
df_recommendations.head()

Unnamed: 0,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
0,Bay Leaf,Bay Leaf,Black Currants,Butter - Unsalted,Soup - Campbells Bean Medley,Cheese - Mozzarella,Cheese - Cambozola,Butter - Unsalted,Butter - Unsalted,Butter - Unsalted,...,Bar - Granola Trail Mix Fruit Nut,Butter - Unsalted,Butter - Unsalted,"Sole - Dover, Whole, Fresh",Wine - Blue Nun Qualitatswein,Wine - Chardonnay South,Wine - Chardonnay South,Bandage - Fexible 1x3,"Wine - White, Colubia Cresh",Wine - Blue Nun Qualitatswein
1,Chocolate - Dark,Muffin - Carrot Individual Wrap,Bread - Italian Roll With Herbs,"Veal - Inside, Choice",Wine - Blue Nun Qualitatswein,Butter - Unsalted,Olive - Spread Tapenade,"Cheese - Boursin, Garlic / Herbs",Knife Plastic - White,Ice Cream Bar - Hageen Daz To,...,Wine - Blue Nun Qualitatswein,Bay Leaf,Cup - Translucent 7 Oz Clear,"Beef - Chuck, Boneless",Lamb - Ground,Butter - Unsalted,Bar - Granola Trail Mix Fruit Nut,Butter - Unsalted,Beef - Montreal Smoked Brisket,Flavouring - Orange
2,"Oranges - Navel, 72",Pomello,Muffin - Carrot Individual Wrap,"Beets - Candy Cane, Organic",Chocolate - Dark,Ice Cream Bar - Hageen Daz To,"Pepper - Black, Whole",Ezy Change Mophandle,"Peas - Pigeon, Dry",Lamb - Ground,...,Beef Ground Medium,Bread Crumbs - Panko,Knife Plastic - White,Cornflakes,Ocean Spray - Ruby Red,Raspberries - Fresh,Pork - Hock And Feet Attached,Knife Plastic - White,Beef - Top Sirloin,Gloves - Goldtouch Disposable
3,Pecan Raisin - Tarts,Pork - Kidney,Pork - Kidney,"Nut - Chestnuts, Whole","Pepper - Black, Whole",Knife Plastic - White,Sprouts - Baby Pea Tendrils,Pork - Kidney,"Pepper - Black, Whole","Pepper - Black, Whole",...,Butter - Unsalted,Juice - Lime,Muffin - Zero Transfat,Foam Dinner Plate,Pork - Kidney,Bar - Granola Trail Mix Fruit Nut,Apricots Fresh,"Pepper - Black, Whole","Cheese - Boursin, Garlic / Herbs",Jagermeister
4,Wine - Charddonnay Errazuriz,Scampi Tail,"Veal - Inside, Choice",Beans - Wax,"Tart Shells - Sweet, 4",Muffin Batt - Blueberry Passion,"Veal - Inside, Choice",Wine - Crozes Hermitage E.,Soup - Campbells Bean Medley,Wine - Ej Gallo Sierra Valley,...,Knife Plastic - White,Pasta - Angel Hair,Pork - Inside,"Pepper - Black, Whole",Sherry - Dry,Beef - Ground Medium,Bandage - Fexible 1x3,Sherry - Dry,Hersey Shakes,Knife Plastic - White


- Cosine

In [52]:
recommendations_dict = {}
customer_similarity = pd.DataFrame(1 / (1 + squareform(pdist(matrix_products.T, metric = 'cosine'))), index = matrix_products.columns, columns = matrix_products.columns)

for customer in unique_cust_ids:
    top_5 = customer_similarity[customer].nlargest(6)[1:]
    similar_customer = df_products.loc[list(top_5.index)]
    products_list = similar_customer.groupby('ProductName').agg('sum').sort_values('Quantity', ascending = False)
    merged_ranked_products = matrix_products.merge(products_list, on = 'ProductName')
    
    recommendations_dict[customer] = list(merged_ranked_products[merged_ranked_products[customer] == 0].nlargest(5, columns = 'Quantity')['Quantity'].index)
    
df_recommendations = pd.DataFrame.from_dict(recommendations_dict, orient = 'columns')
df_recommendations.head()

Unnamed: 0,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
0,Knife Plastic - White,Longos - Grilled Salmon With Bbq,Pickerel - Fillets,Bread - English Muffin,"Salsify, Organic",Wonton Wrappers,"Pasta - Penne, Rigate, Dry",Wine - Pinot Noir Latour,"Fish - Scallops, Cold Smoked",Cookies - Assorted,...,Vanilla Beans,Wine - Fume Blanc Fetzer,Spice - Peppercorn Melange,Sponge Cake Mix - Chocolate,Beef - Striploin Aa,Beef - Montreal Smoked Brisket,Sprouts - Baby Pea Tendrils,"Water, Tap","Peas - Pigeon, Dry",Cheese - Cheddarsliced
1,"Soup - Campbells, Beef Barley",Snapple Lemon Tea,"Water - Mineral, Natural",Olive - Spread Tapenade,Durian Fruit,Bread - Bistro White,Olive - Spread Tapenade,Crab - Imitation Flakes,Napkin White - Starched,Wine - Blue Nun Qualitatswein,...,Cheese - Cottage Cheese,Blackberries,French Pastry - Mini Chocolate,Bananas,Kellogs Special K Cereal,"Spoon - Soup, Plastic",Juice - Orange,Bananas,Cheese - Taleggio D.o.p.,Cheese - Cottage Cheese
2,Onions - Cippolini,General Purpose Trigger,Snapple - Iced Tea Peach,Bagel - Plain,Wine - Hardys Bankside Shiraz,Sausage - Breakfast,Spinach - Baby,Cheese - Parmesan Cubes,"Mushroom - Porcini, Dry",Pickerel - Fillets,...,Tomatoes Tear Drop,Coffee - Dark Roast,"Beef - Tenderlion, Center Cut",Initation Crab Meat,Langers - Ruby Red Grapfruit,Blackberries,Muffin Batt - Blueberry Passion,Wine - Redchard Merritt,Squid U5 - Thailand,Wine - Toasted Head
3,Tea - Herbal Sweet Dreams,"Thyme - Lemon, Fresh",Wine - Ej Gallo Sierra Valley,Pork - Hock And Feet Attached,Bread - Raisin Walnut Oval,Beef - Montreal Smoked Brisket,Veal - Osso Bucco,Bouq All Italian - Primerba,Pork - Inside,Beer - Rickards Red,...,Bananas,Hersey Shakes,Scallop - St. Jaques,"Pork - Loin, Center Cut",Pears - Bosc,Cheese Cloth No 100,Beer - Original Organic Lager,Chocolate - Compound Coating,"Cheese - Brie, Triple Creme",Island Oasis - Mango Daiquiri
4,Banana Turning,Tomatoes Tear Drop,French Pastry - Mini Chocolate,Oil - Shortening - All - Purpose,Gatorade - Xfactor Berry,Garbage Bags - Clear,Carbonated Water - Blackcherry,Whmis - Spray Bottle Trigger,Wine - Pinot Noir Latour,"Hickory Smoke, Liquid",...,Ice Cream Bar - Hageen Daz To,Papayas,Water - Green Tea Refresher,"Sole - Dover, Whole, Fresh",Beef - Rib Eye Aaa,Cake - Mini Cheesecake,Flavouring - Orange,Cumin - Whole,Crackers - Trio,Jolt Cola - Electric Blue


- Jaccard

In [53]:
recommendations_dict = {}
customer_similarity = pd.DataFrame(1 / (1 + squareform(pdist(matrix_products.T, metric = 'jaccard'))), index = matrix_products.columns, columns = matrix_products.columns)

for customer in unique_cust_ids:
    top_5 = customer_similarity[customer].nlargest(6)[1:]
    similar_customer = df_products.loc[list(top_5.index)]
    products_list = similar_customer.groupby('ProductName').agg('sum').sort_values('Quantity', ascending = False)
    merged_ranked_products = matrix_products.merge(products_list, on = 'ProductName')
    
    recommendations_dict[customer] = list(merged_ranked_products[merged_ranked_products[customer] == 0].nlargest(5, columns = 'Quantity')['Quantity'].index)
    
df_recommendations = pd.DataFrame.from_dict(recommendations_dict, orient = 'columns')
df_recommendations.head()

Unnamed: 0,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
0,Loquat,Muffin - Carrot Individual Wrap,Sauce - Demi Glace,"Veal - Inside, Choice",Cumin - Whole,Cheese - Mozzarella,Cheese - Parmesan Grated,Bacardi Breezer - Tropical,Bread - Italian Roll With Herbs,Cod - Black Whole Fillet,...,"Beef - Ground, Extra Lean, Fresh",Juice - Orange,Coffee - Irish Cream,Bacardi Breezer - Tropical,Fenngreek Seed,Tray - 16in Rnd Blk,Spinach - Baby,Apricots - Halves,Coffee - Irish Cream,Muffin - Carrot Individual Wrap
1,Bananas,"Beans - Kidney, Canned",Flavouring - Orange,Beef - Inside Round,Blueberries,Sausage - Breakfast,Ice Cream Bar - Hageen Daz To,"Mushrooms - Black, Dried","Chestnuts - Whole,canned",Beans - Wax,...,Berry Brulee,Rum - Mount Gay Eclipes,Beef - Striploin Aa,"Beer - Alexander Kieths, Pale Ale",Bread - Italian Corn Meal Poly,Pastry - Choclate Baked,Beef - Ground Medium,Chocolate - Compound Coating,Soup V8 Roasted Red Pepper,Beef - Top Sirloin
2,Blueberries,Pomello,Wine - Two Oceans Cabernet,Bread - Italian Roll With Herbs,Chocolate - Dark,Assorted Desserts,Sugar - Fine,Soup Knorr Chili With Beans,Tea - Jasmin Green,Smirnoff Green Apple Twist,...,Cheese - Parmesan Grated,Spice - Peppercorn Melange,Browning Caramel Glace,"Cheese - Brie,danish",Pastry - Choclate Baked,Pernod,Bread - French Baquette,Mustard Prepared,Bandage - Flexible Neon,Browning Caramel Glace
3,"Cheese - Brie, Triple Creme",Berry Brulee,Beer - Blue,Lettuce - Spring Mix,Guinea Fowl,Bananas,Wine - Gato Negro Cabernet,"Beef - Ground, Extra Lean, Fresh",Appetizer - Sausage Rolls,Appetizer - Sausage Rolls,...,Flour - Whole Wheat,Apricots - Halves,"Cheese - Brie,danish",Fenngreek Seed,Vinegar - Sherry,Banana Turning,Cheese - Parmesan Cubes,Appetizer - Sausage Rolls,Blackberries,Isomalt
4,Cheese Cloth No 100,Chips Potato Salt Vinegar 43g,Beer - Original Organic Lager,"Pepper - Black, Whole",Lettuce - Spring Mix,Cheese - Cottage Cheese,Beer - Blue,Bread - Italian Corn Meal Poly,Beef - Striploin Aa,Bacardi Breezer - Tropical,...,Langers - Ruby Red Grapfruit,Beef - Ground Medium,Cream Of Tartar,"Garlic - Primerba, Paste",Beef - Montreal Smoked Brisket,"Beans - Kidney, Canned",Tea - Jasmin Green,Assorted Desserts,Crab - Imitation Flakes,Mussels - Cultivated
