# Intro to Recommender Systems Lab

Complete the exercises below to solidify your knowledge and understanding of recommender systems.

For this lab, we are going to be putting together a user similarity based recommender system in a step-by-step fashion. Our data set contains customer grocery purchases, and we will use similar purchase behavior to inform our recommender system. Our recommender system will generate 5 recommendations for each customer based on the purchases they have made.

In [1]:
import pandas as pd
from scipy.spatial.distance import pdist, squareform
print('librerias importadas')

librerias importadas


In [2]:
data = pd.read_csv('../data/customer_product_sales.csv')

In [3]:
data.head()

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity
0,61288,Rosa,Andersen,134196,229,Bread - Hot Dog Buns,16
1,77352,Myron,Murray,6167892,229,Bread - Hot Dog Buns,20
2,40094,Susan,Stevenson,5970885,229,Bread - Hot Dog Buns,11
3,23548,Tricia,Vincent,6426954,229,Bread - Hot Dog Buns,6
4,78981,Scott,Burch,819094,229,Bread - Hot Dog Buns,20


In [4]:
data.sort_values('CustomerID', ascending=True).head(5)

Unnamed: 0,CustomerID,FirstName,LastName,SalesID,ProductID,ProductName,Quantity
14280,33,Lindsay,Santana,2005605,162,Sauce - Demi Glace,1
3012,33,Lindsay,Santana,5638266,214,French Pastry - Mini Chocolate,1
18070,33,Lindsay,Santana,5056183,387,Fondant - Icing,1
4118,33,Lindsay,Santana,1888258,53,Cassis,1
53450,33,Lindsay,Santana,140335,245,Grouper - Fresh,1


## Step 1: Create a data frame that contains the total quantity of each product purchased by each customer.

You will need to group by CustomerID and ProductName and then sum the Quantity field.

In [5]:
productos = pd.DataFrame(data.groupby(['CustomerID','ProductName'])['Quantity'].sum())
productos.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Quantity
CustomerID,ProductName,Unnamed: 2_level_1
33,Apricots - Dried,1
33,Assorted Desserts,1
33,Bandage - Flexible Neon,1
33,"Bar Mix - Pina Colada, 355 Ml",1
33,"Beans - Kidney, Canned",1


## Step 2: Use the `pivot_table` method to create a product by customer matrix.

The rows of the matrix should represent the products, the columns should represent the customers, and the values should be the quantities of each product purchased by each customer. You will also need to replace nulls with zeros, which you can do using the `fillna` method.

In [6]:
tablita = pd.pivot_table(data, values = 'Quantity', index = ['ProductName'], columns=['CustomerID']).fillna(0)
#tablita

## Step 3: Create a customer similarity matrix using `squareform` and `pdist`. For the distance metric, choose "euclidean."

In [7]:
cuadradoeuclides = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'euclidean'))), index = tablita.columns, columns = tablita.columns)

In [8]:
cuadradoeuclides.head(6)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.085297,0.093953,0.091747,0.08741,0.089695,0.085297,0.088913,0.088152,0.089695,...,0.004809,0.005108,0.004996,0.005421,0.00492,0.005023,0.00488,0.005026,0.004549,0.004883
200,0.085297,1.0,0.085638,0.085297,0.08007,0.08302,0.084959,0.083651,0.085638,0.087047,...,0.004825,0.005121,0.005014,0.005448,0.004925,0.005032,0.004909,0.005042,0.004553,0.004879
264,0.093953,0.085638,1.0,0.088152,0.089301,0.087047,0.085638,0.086333,0.087047,0.087047,...,0.004822,0.005115,0.004996,0.005441,0.004932,0.005055,0.004894,0.005042,0.004566,0.004883
356,0.091747,0.085297,0.088152,1.0,0.085983,0.086688,0.085983,0.091325,0.085983,0.08741,...,0.004814,0.005111,0.004999,0.005437,0.00492,0.005036,0.004871,0.005042,0.004563,0.004886
412,0.08741,0.08007,0.089301,0.085983,1.0,0.085638,0.085638,0.089301,0.084959,0.087779,...,0.004808,0.005131,0.004996,0.005441,0.004925,0.005042,0.004876,0.005039,0.004568,0.004903
464,0.089695,0.08302,0.087047,0.086688,0.085638,1.0,0.085638,0.09261,0.087779,0.087047,...,0.004814,0.005121,0.004993,0.005445,0.00492,0.005042,0.004877,0.005039,0.004556,0.004897


## Step 4: Check your results by generating a list of the top 5 most similar customers for a specific CustomerID.

In [9]:
lista = cuadradoeuclides.sort_values([33], ascending=False).head(6).reset_index()
lista

CustomerID,CustomerID.1,33,200,264,356,412,464,477,639,649,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
0,33,1.0,0.085297,0.093953,0.091747,0.08741,0.089695,0.085297,0.088913,0.088152,...,0.004809,0.005108,0.004996,0.005421,0.00492,0.005023,0.00488,0.005026,0.004549,0.004883
1,3909,0.095358,0.089695,0.088913,0.08853,0.088913,0.088152,0.088913,0.090499,0.088152,...,0.004809,0.005118,0.00499,0.005433,0.00492,0.005036,0.00488,0.00502,0.004554,0.004889
2,3531,0.093953,0.084297,0.086333,0.088913,0.085638,0.087779,0.087047,0.086333,0.089301,...,0.004814,0.005125,0.00499,0.005437,0.004923,0.005049,0.004891,0.005042,0.004551,0.004877
3,264,0.093953,0.085638,1.0,0.088152,0.089301,0.087047,0.085638,0.086333,0.087047,...,0.004822,0.005115,0.004996,0.005441,0.004932,0.005055,0.004894,0.005042,0.004566,0.004883
4,2503,0.093498,0.088152,0.088913,0.09261,0.088152,0.093953,0.088913,0.093051,0.093051,...,0.004815,0.005109,0.004994,0.005422,0.004912,0.005043,0.004878,0.00503,0.004559,0.004869
5,3305,0.093051,0.084959,0.090909,0.093953,0.087047,0.08853,0.087779,0.09261,0.089301,...,0.004812,0.005112,0.005003,0.005422,0.004917,0.005036,0.004871,0.005036,0.004559,0.00488


In [10]:
iguales = list(lista.CustomerID[1:])

In [11]:
iguales

[3909, 3531, 264, 2503, 3305]

## Step 5: From the data frame you created in Step 1, select the records for the list of similar CustomerIDs you obtained in Step 4.

In [12]:
#dataframe creado en el paso 1
#productos.head()

In [13]:
productos2 = productos.reset_index()
productos2.set_index(['CustomerID'], inplace = True)
productos2

Unnamed: 0_level_0,ProductName,Quantity
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1
33,Apricots - Dried,1
33,Assorted Desserts,1
33,Bandage - Flexible Neon,1
33,"Bar Mix - Pina Colada, 355 Ml",1
33,"Beans - Kidney, Canned",1
33,"Beef - Chuck, Boneless",1
33,Beef - Prime Rib Aaa,1
33,Beer - Original Organic Lager,1
33,Beer - Rickards Red,1
33,Black Currants,1


In [14]:
#Reviso si el indice de productos2 esta en iguales y lo guardo en acotado y uso loc para solo traer las filas que cumplan esa condicion
acotado = productos2.loc[productos2.index.isin(iguales)]

In [15]:
#comprobamos que el CustomerID sea el que necesito
acotado.head()

Unnamed: 0_level_0,ProductName,Quantity
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1
264,Apricots - Halves,1
264,Apricots Fresh,1
264,Bacardi Breezer - Tropical,1
264,Bagel - Plain,1
264,Banana - Leaves,1


## Step 6: Aggregate those customer purchase records by ProductName, sum the Quantity field, and then rank them in descending order by quantity.

This will give you the total number of each product purchased by the 5 most similar customers to the customer you selected in order from most purchased to least.

In [24]:
simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
simi.reset_index(inplace=True)

In [17]:
print(simi.ProductName)

0                      Salsify, Organic
1          Wine - Charddonnay Errazuriz
2         Wine - Ej Gallo Sierra Valley
3                   Oranges - Navel, 72
4                       Quiche Assorted
5                              Bay Leaf
6                  Pecan Raisin - Tarts
7                      Chocolate - Dark
8                Towels - Paper / Kraft
9                           Tofu - Firm
10                        Sauce - Rosee
11                             Sardines
12        Pork - Hock And Feet Attached
13                       Black Currants
14                           Sauerkraut
15                     Eggplant - Asian
16                     V8 - Berry Blend
17                        Chef Hat 20cm
18          Beets - Candy Cane, Organic
19            Ice Cream Bar - Oreo Cone
20     Oil - Shortening - All - Purpose
21                      Sausage - Liver
22                           Dried Figs
23                  Squid U5 - Thailand
24                          Bread - Rye


## Step 7: Filter the list for products that the chosen customer has not yet purchased and then recommend the top 5 products with the highest quantities that are left.

- Merge the ranked products data frame with the customer product matrix on the ProductName field.
- Filter for records where the chosen customer has not purchased the product.
- Show the top 5 results.

In [18]:
treintaytres = data[data['CustomerID']==33]
productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
productos33.reset_index(inplace = True)

In [19]:
df_inner = pd.merge(simi, productos33, on='ProductName', how='left')

In [20]:
df_inner

Unnamed: 0,ProductName,Quantity_x,Quantity_y
0,"Salsify, Organic",4,2.0
1,Wine - Charddonnay Errazuriz,3,
2,Wine - Ej Gallo Sierra Valley,3,
3,"Oranges - Navel, 72",3,
4,Quiche Assorted,3,1.0
5,Bay Leaf,3,
6,Pecan Raisin - Tarts,3,
7,Chocolate - Dark,3,
8,Towels - Paper / Kraft,2,1.0
9,Tofu - Firm,2,


In [45]:
nocomprados = df_inner[df_inner.Quantity_y.isna()]
nocomprados2 = nocomprados.reset_index()
nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
nocomprados3

Unnamed: 0,ProductName,Quantity_x
0,Muffin - Carrot Individual Wrap,5
1,"Cheese - Boursin, Garlic / Herbs",4
2,Pork - Kidney,4
3,"Pasta - Detalini, White, Fresh",3
4,Raspberries - Fresh,3
5,"Beans - Kidney, Canned",3
6,Sausage - Liver,3
7,Ice Cream Bar - Hageen Daz To,3
8,Kellogs Special K Cereal,3
9,Beef Ground Medium,3


In [29]:
top5 = nocomprados.head(5)
top5

Unnamed: 0,ProductName,Quantity_x
0,Wine - Charddonnay Errazuriz,3
1,Wine - Ej Gallo Sierra Valley,3
2,"Oranges - Navel, 72",3
3,Bay Leaf,3
4,Pecan Raisin - Tarts,3


In [38]:
#recomendaciones['CustomerID']=
recomendaciones[0]=top5.ProductName[0]
recomendaciones[1]=top5.ProductName[1]
recomendaciones[2]=top5.ProductName[2]
recomendaciones[3]=top5.ProductName[3]
recomendaciones[4]=top5.ProductName[4]

In [39]:
recomendaciones

{0: 'Wine - Charddonnay Errazuriz',
 1: 'Wine - Ej Gallo Sierra Valley',
 2: 'Oranges - Navel, 72',
 3: 'Bay Leaf',
 4: 'Pecan Raisin - Tarts'}

## Step 8: Now that we have generated product recommendations for a single user, put the pieces together and iterate over a list of all CustomerIDs.

- Create an empty dictionary that will hold the recommendations for all customers.
- Create a list of unique CustomerIDs to iterate over.
- Iterate over the customer list performing steps 4 through 7 for each and appending the results of each iteration to the dictionary you created.

In [47]:
recomendaciones=dict()
listarecomendaciones= []
clientes = data.CustomerID.unique()

In [49]:
for cliente in clientes:
    lista = cuadradoeuclides.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    recomendaciones['CustomerID']= cliente
    recomendaciones[0]=top5.ProductName[0]
    recomendaciones[1]=top5.ProductName[1]
    recomendaciones[2]=top5.ProductName[2]
    recomendaciones[3]=top5.ProductName[3]
    recomendaciones[4]=top5.ProductName[4]
    listarecomendaciones.append(recomendaciones.copy())
    

##  Step 9: Store the results in a Pandas data frame. The data frame should a column for Customer ID and then a column for each of the 5 product recommendations for each customer.

In [52]:
tonio = pd.DataFrame(listarecomendaciones)
tonio.set_index('CustomerID', inplace = True)

In [53]:
tonio

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,"Mushrooms - Black, Dried","Wine - Magnotta, Merlot Sr Vqa",Chicken - Soup Base,Wine - Chardonnay South,Milk - 1%
77352,Guinea Fowl,Grenadine,"Oranges - Navel, 72",Ecolab - Mikroklene 4/4 L,"Shrimp - Baby, Warm Water"
40094,"Water - Mineral, Natural","Oregano - Dry, Rubbed",Pasta - Orecchiette,Quiche Assorted,Tuna - Salad Premix
23548,Wanton Wrap,Banana Turning,Flavouring - Orange,"Chocolate - Semi Sweet, Calets",Lettuce - Treviso
78981,Lettuce - Frisee,Longos - Chicken Wings,Pop Shoppe Cream Soda,Beef - Inside Round,Sprouts - Alfalfa
83106,"Cheese - Boursin, Garlic / Herbs",Garlic - Peeled,Soup - Campbells Tomato Ravioli,Beef - Rib Eye Aaa,Wine - Chablis 2003 Champs
11253,Smirnoff Green Apple Twist,Lettuce - Treviso,Tomatoes Tear Drop,"Juice - Cranberry, 341 Ml",Cumin - Whole
35107,Flavouring - Orange,Ice Cream Bar - Hageen Daz To,Fuji Apples,Dc Hikiage Hira Huba,Beef - Prime Rib Aaa
15088,Smirnoff Green Apple Twist,Blueberries,Watercress,Curry Paste - Madras,"Pepper - Black, Whole"
26031,Muffin Batt - Choc Chk,Beef - Top Sirloin,Cheese - Mozzarella,Olive - Spread Tapenade,Juice - Apple Cider


## Step 10: Change the distance metric used in Step 3 to something other than euclidean (correlation, cityblock, cosine, jaccard, etc.). Regenerate the recommendations for all customers and note the differences.

In [63]:
cuadradobraycurtis = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'braycurtis'))), index = tablita.columns, columns = tablita.columns)

In [87]:
cuadradobraycurtis.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.53252,0.565421,0.550459,0.538136,0.540179,0.524793,0.527027,0.530702,0.536036,...,0.50197,0.501583,0.502423,0.501067,0.502352,0.501227,0.502027,0.501535,0.501517,0.502317
200,0.53252,1.0,0.540323,0.53629,0.514706,0.523438,0.546875,0.52,0.540323,0.545455,...,0.503643,0.502834,0.504222,0.503541,0.502921,0.502134,0.504913,0.503054,0.502011,0.502011
264,0.565421,0.540323,1.0,0.534783,0.555556,0.529915,0.532787,0.517241,0.529915,0.525862,...,0.50338,0.502215,0.502418,0.502849,0.503529,0.504313,0.503478,0.503073,0.503293,0.502313
356,0.550459,0.53629,0.534783,1.0,0.533058,0.525641,0.533058,0.545872,0.521186,0.526087,...,0.502532,0.501899,0.502724,0.502493,0.502349,0.502457,0.501155,0.503075,0.50304,0.502606
412,0.538136,0.514706,0.555556,0.533058,1.0,0.532787,0.544,0.547826,0.528455,0.542373,...,0.50196,0.503795,0.50241,0.502837,0.502927,0.503062,0.501727,0.502754,0.503537,0.50434


In [112]:
Rbraycurtis=dict()
lbraycurtis= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradobraycurtis.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rbraycurtis['CustomerID']= cliente
    Rbraycurtis[0]=top5.ProductName[0]
    Rbraycurtis[1]=top5.ProductName[1]
    Rbraycurtis[2]=top5.ProductName[2]
    Rbraycurtis[3]=top5.ProductName[3]
    Rbraycurtis[4]=top5.ProductName[4]
    lbraycurtis.append(Rbraycurtis.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[1577, 264, 3531, 3909, 756]


In [65]:
cuadradocanberra = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'canberra'))), index = tablita.columns, columns = tablita.columns)

In [88]:
cuadradocanberra.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.008621,0.010638,0.010101,0.009091,0.009615,0.008621,0.009434,0.009259,0.009615,...,0.008233,0.008649,0.008667,0.009028,0.008519,0.008425,0.008371,0.008502,0.007661,0.008447
200,0.008621,1.0,0.008696,0.008621,0.007519,0.00813,0.008547,0.008264,0.008696,0.009009,...,0.007813,0.008045,0.008202,0.008604,0.007798,0.007784,0.008217,0.007985,0.007073,0.007549
264,0.010638,0.008696,1.0,0.009259,0.009524,0.009009,0.008696,0.00885,0.009009,0.009009,...,0.008398,0.008587,0.008447,0.009226,0.008615,0.009015,0.008541,0.008678,0.007937,0.008238
356,0.010101,0.008621,0.009259,1.0,0.008772,0.008929,0.008772,0.01,0.008772,0.009091,...,0.008244,0.008581,0.008598,0.00922,0.008376,0.008592,0.00802,0.008754,0.007932,0.008382
412,0.009091,0.007519,0.009524,0.008772,1.0,0.008696,0.008696,0.009524,0.008547,0.009174,...,0.007665,0.008541,0.00804,0.008742,0.00805,0.008249,0.00772,0.008176,0.007638,0.008344


In [113]:
Rcanberra=dict()
lcanberra= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradocanberra.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rcanberra['CustomerID']= cliente
    Rcanberra[0]=top5.ProductName[0]
    Rcanberra[1]=top5.ProductName[1]
    Rcanberra[2]=top5.ProductName[2]
    Rcanberra[3]=top5.ProductName[3]
    Rcanberra[4]=top5.ProductName[4]
    lcanberra.append(Rcanberra.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[10016, 3909, 14913, 4261, 264]


In [67]:
cuadradochebyshev = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'chebyshev'))), index = tablita.columns, columns = tablita.columns)

In [95]:
cuadradochebyshev.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462
200,0.5,1.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462
264,0.5,0.5,1.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,...,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462
356,0.5,0.5,0.5,1.0,0.5,0.5,0.5,0.5,0.5,0.5,...,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462
412,0.5,0.5,0.5,0.5,1.0,0.5,0.5,0.5,0.5,0.5,...,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462,0.038462


In [114]:
Rchebyshev=dict()
lchebyshev= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradochebyshev.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rchebyshev['CustomerID']= cliente
    Rchebyshev[0]=top5.ProductName[0]
    Rchebyshev[1]=top5.ProductName[1]
    Rchebyshev[2]=top5.ProductName[2]
    Rchebyshev[3]=top5.ProductName[3]
    Rchebyshev[4]=top5.ProductName[4]
    lchebyshev.append(Rchebyshev.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[2187, 2503, 2556, 2566, 2582]


In [69]:
cuadradocityblock = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'cityblock'))), index = tablita.columns, columns = tablita.columns)

In [90]:
cuadradocityblock.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.008621,0.010638,0.010101,0.009091,0.009615,0.008621,0.009434,0.009259,0.009615,...,0.000565,0.000635,0.000608,0.000712,0.00059,0.000615,0.000581,0.000615,0.000507,0.000582
200,0.008621,1.0,0.008696,0.008621,0.007519,0.00813,0.008547,0.008264,0.008696,0.009009,...,0.000564,0.000633,0.000608,0.000713,0.000587,0.000612,0.000583,0.000614,0.000505,0.000577
264,0.010638,0.008696,1.0,0.009259,0.009524,0.009009,0.008696,0.00885,0.009009,0.009009,...,0.000567,0.000635,0.000607,0.000716,0.000592,0.000621,0.000583,0.000618,0.00051,0.000581
356,0.010101,0.008621,0.009259,1.0,0.008772,0.008929,0.008772,0.01,0.008772,0.009091,...,0.000565,0.000635,0.000608,0.000715,0.00059,0.000617,0.000578,0.000618,0.000509,0.000582
412,0.009091,0.007519,0.009524,0.008772,1.0,0.008696,0.008696,0.009524,0.008547,0.009174,...,0.000562,0.000637,0.000605,0.000713,0.000589,0.000616,0.000577,0.000615,0.000509,0.000583


In [115]:
Rcityblock=dict()
lcityblock= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradocityblock.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rcityblock['CustomerID']= cliente
    Rcityblock[0]=top5.ProductName[0]
    Rcityblock[1]=top5.ProductName[1]
    Rcityblock[2]=top5.ProductName[2]
    Rcityblock[3]=top5.ProductName[3]
    Rcityblock[4]=top5.ProductName[4]
    lcityblock.append(Rcityblock.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[3909, 264, 3531, 2503, 3305]


In [71]:
cuadradocorrelation = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'correlation'))), index = tablita.columns, columns = tablita.columns)

In [91]:
cuadradocorrelation.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.493805,0.529875,0.515035,0.50057,0.504367,0.487132,0.492406,0.494838,0.500815,...,0.491005,0.486159,0.498339,0.480313,0.497157,0.48075,0.492063,0.485173,0.482914,0.496579
200,0.493805,1.0,0.500545,0.496849,0.472695,0.483532,0.504988,0.48155,0.500545,0.506509,...,0.508587,0.496849,0.51706,0.506602,0.497813,0.487096,0.528445,0.499846,0.48342,0.484852
264,0.529875,0.500545,1.0,0.498278,0.51708,0.493067,0.494101,0.481681,0.493067,0.489652,...,0.511597,0.493648,0.496434,0.502956,0.513787,0.526176,0.513045,0.506386,0.510648,0.494672
356,0.515035,0.496849,0.498278,1.0,0.494727,0.489104,0.494727,0.510742,0.484642,0.49019,...,0.4986,0.489643,0.50169,0.498569,0.495876,0.497664,0.478039,0.5071,0.507023,0.499809
412,0.50057,0.472695,0.51708,0.494727,1.0,0.494101,0.503354,0.510747,0.489748,0.504478,...,0.485869,0.513118,0.492877,0.49941,0.50031,0.502343,0.482847,0.497874,0.510144,0.522382


In [116]:
Rcorrelation=dict()
lcorrelation= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradocorrelation.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rcorrelation['CustomerID']= cliente
    Rcorrelation[0]=top5.ProductName[0]
    Rcorrelation[1]=top5.ProductName[1]
    Rcorrelation[2]=top5.ProductName[2]
    Rcorrelation[3]=top5.ProductName[3]
    Rcorrelation[4]=top5.ProductName[4]
    lcorrelation.append(Rcorrelation.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[10016, 27672, 45786, 38265, 5224]


In [73]:
cuadradocosine = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'cosine'))), index = tablita.columns, columns = tablita.columns)

In [92]:
cuadradocosine.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.532692,0.565443,0.550466,0.538239,0.540192,0.524859,0.527028,0.530712,0.536037,...,0.529019,0.521742,0.534813,0.51365,0.534245,0.516959,0.529475,0.52138,0.523291,0.533971
200,0.532692,1.0,0.540444,0.536424,0.514712,0.523506,0.546896,0.520122,0.540444,0.545661,...,0.550794,0.536424,0.557488,0.543589,0.539102,0.527408,0.569724,0.540096,0.528386,0.526533
264,0.565443,0.540444,1.0,0.534784,0.555621,0.529915,0.532824,0.517251,0.529915,0.525866,...,0.5505,0.530173,0.533903,0.537132,0.551752,0.563065,0.551325,0.54348,0.551922,0.533084
356,0.550466,0.536424,0.534784,1.0,0.53311,0.525642,0.53311,0.545887,0.521187,0.526088,...,0.537266,0.525862,0.538805,0.532472,0.533641,0.534488,0.516147,0.543866,0.547973,0.53786
412,0.538239,0.514712,0.555621,0.53311,1.0,0.532824,0.544,0.547992,0.528487,0.542463,...,0.526924,0.551365,0.532274,0.535338,0.540327,0.541355,0.523256,0.536914,0.553549,0.562502


In [111]:
Rcosine=dict()
lcosine= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradocosine.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rcosine['CustomerID']= cliente
    Rcosine[0]=top5.ProductName[0]
    Rcosine[1]=top5.ProductName[1]
    Rcosine[2]=top5.ProductName[2]
    Rcosine[3]=top5.ProductName[3]
    Rcosine[4]=top5.ProductName[4]
    lcosine.append(Rcosine.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[27672, 10016, 60862, 76532, 85766]


In [75]:
cuadradomahalanobis = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'mahalanobis'))), index = tablita.columns, columns = tablita.columns)

In [93]:
cuadradomahalanobis.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.223809,0.246158,0.226607,0.235571,0.223056,0.215632,0.217401,0.224951,0.226557,...,0.033655,0.033976,0.034061,0.034247,0.034071,0.034016,0.033896,0.033898,0.033483,0.033942
200,0.223809,1.0,0.216018,0.212059,0.213119,0.217282,0.219524,0.210211,0.23089,0.21911,...,0.033765,0.03404,0.034234,0.034274,0.034158,0.034161,0.034199,0.034008,0.033545,0.033961
264,0.246158,0.216018,1.0,0.222772,0.224746,0.209379,0.21312,0.204069,0.213768,0.206947,...,0.033581,0.034065,0.034012,0.034369,0.034117,0.034139,0.03405,0.03392,0.033721,0.033944
356,0.226607,0.212059,0.222772,1.0,0.216621,0.213005,0.224125,0.232875,0.211539,0.219396,...,0.033864,0.034032,0.034091,0.034409,0.034079,0.033963,0.033805,0.033946,0.033667,0.034077
412,0.235571,0.213119,0.224746,0.216621,1.0,0.219658,0.213812,0.223039,0.228258,0.231841,...,0.033687,0.034298,0.034114,0.03434,0.033844,0.03407,0.033918,0.033823,0.033682,0.034118


In [110]:
Rmahalanobis=dict()
lmahalanobis= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradomahalanobis.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rmahalanobis['CustomerID']= cliente
    Rmahalanobis[0]=top5.ProductName[0]
    Rmahalanobis[1]=top5.ProductName[1]
    Rmahalanobis[2]=top5.ProductName[2]
    Rmahalanobis[3]=top5.ProductName[3]
    Rmahalanobis[4]=top5.ProductName[4]
    lmahalanobis.append(Rmahalanobis.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)

33
[264, 3535, 2187, 3317, 756]


In [77]:
cuadradominkowski = pd.DataFrame(1/(1 + squareform(pdist(tablita.T, 'minkowski'))), index = tablita.columns, columns = tablita.columns)

In [94]:
cuadradominkowski.head(5)

CustomerID,33,200,264,356,412,464,477,639,649,669,...,97697,97753,97769,97793,97900,97928,98069,98159,98185,98200
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
33,1.0,0.085297,0.093953,0.091747,0.08741,0.089695,0.085297,0.088913,0.088152,0.089695,...,0.004809,0.005108,0.004996,0.005421,0.00492,0.005023,0.00488,0.005026,0.004549,0.004883
200,0.085297,1.0,0.085638,0.085297,0.08007,0.08302,0.084959,0.083651,0.085638,0.087047,...,0.004825,0.005121,0.005014,0.005448,0.004925,0.005032,0.004909,0.005042,0.004553,0.004879
264,0.093953,0.085638,1.0,0.088152,0.089301,0.087047,0.085638,0.086333,0.087047,0.087047,...,0.004822,0.005115,0.004996,0.005441,0.004932,0.005055,0.004894,0.005042,0.004566,0.004883
356,0.091747,0.085297,0.088152,1.0,0.085983,0.086688,0.085983,0.091325,0.085983,0.08741,...,0.004814,0.005111,0.004999,0.005437,0.00492,0.005036,0.004871,0.005042,0.004563,0.004886
412,0.08741,0.08007,0.089301,0.085983,1.0,0.085638,0.085638,0.089301,0.084959,0.087779,...,0.004808,0.005131,0.004996,0.005441,0.004925,0.005042,0.004876,0.005039,0.004568,0.004903


In [109]:
Rminkowski=dict()
lminkowski= []
clientes = data.CustomerID.unique()

for cliente in clientes:
    lista = cuadradominkowski.sort_values([cliente], ascending=False).head(6).reset_index()
    iguales = list(lista.CustomerID[1:])
    productos2 = productos.reset_index()
    productos2.set_index(['CustomerID'], inplace = True)
    acotado = productos2.loc[productos2.index.isin(iguales)]
    simi = pd.DataFrame(acotado.groupby(['ProductName'])['Quantity'].sum().sort_values(ascending=False))
    simi.reset_index(inplace=True)
    treintaytres = data[data['CustomerID']==cliente]
    productos33 = pd.DataFrame(treintaytres.groupby('ProductName')['Quantity'].sum())
    productos33.reset_index(inplace = True)
    df_inner = pd.merge(simi, productos33, on='ProductName', how='left')
    nocomprados = df_inner[df_inner.Quantity_y.isna()]
    nocomprados2 = nocomprados.reset_index()
    nocomprados3 = nocomprados2.drop(columns = ['index', 'Quantity_y'])
    top5 = nocomprados3.head(5)
    Rminkowski['CustomerID']= cliente
    Rminkowski[0]=top5.ProductName[0]
    Rminkowski[1]=top5.ProductName[1]
    Rminkowski[2]=top5.ProductName[2]
    Rminkowski[3]=top5.ProductName[3]
    Rminkowski[4]=top5.ProductName[4]
    lminkowski.append(Rminkowski.copy())
    if cliente == 33:
        print(lista.CustomerID[0])
        print(iguales)
    

33
[3909, 3531, 264, 2503, 3305]


# Recomendaciones por los diferentes métodos

In [79]:
braycurtis = pd.DataFrame(lbraycurtis)
braycurtis.set_index('CustomerID', inplace = True)
braycurtis.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,"Cheese - Brie, Triple Creme",Snapple Lemon Tea,"Coconut - Shredded, Sweet",Meldea Green Tea Liquor,Water - Aquafina Vitamin
77352,Garbag Bags - Black,Browning Caramel Glace,Water - Aquafina Vitamin,Tia Maria,Flavouring - Orange
40094,Wine - Sogrape Mateus Rose,"Juice - Cranberry, 341 Ml",Snapple - Iced Tea Peach,Rosemary - Dry,Spinach - Baby
23548,Cheese Cloth No 100,Sponge Cake Mix - Chocolate,Kellogs Special K Cereal,Milk - 2%,"Lamb - Pieces, Diced"
78981,Wine - Blue Nun Qualitatswein,Baking Powder,Cheese - Cottage Cheese,"Salmon - Atlantic, Skin On",Cocoa Butter


In [80]:
canberra = pd.DataFrame(lcanberra)
canberra.set_index('CustomerID', inplace = True)
canberra.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,"Appetizer - Mini Egg Roll, Shrimp","Wine - White, Schroder And Schyl",Pail With Metal Handle 16l White,Assorted Desserts,"Beef - Tenderlion, Center Cut"
77352,Grenadine,"Wine - Red, Colio Cabernet",Wine - White Cab Sauv.on,Beef - Prime Rib Aaa,Garbage Bags - Clear
40094,Cinnamon Buns Sticky,"Wine - Magnotta, Merlot Sr Vqa",Sponge Cake Mix - Chocolate,Pail For Lid 1537,"Turkey - Whole, Fresh"
23548,"Sauce - Gravy, Au Jus, Mix",Beef - Rib Eye Aaa,Napkin White - Starched,"Lamb - Pieces, Diced",Beef - Texas Style Burger
78981,Duck - Breast,Muffin - Zero Transfat,Cookie Dough - Double,"Lamb - Pieces, Diced",French Pastry - Mini Chocolate


In [81]:
chebyshev = pd.DataFrame(lchebyshev)
chebyshev.set_index('CustomerID', inplace = True)
chebyshev.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,Blueberries,Cheese - Mozzarella,Wine - Two Oceans Cabernet,"Soup - Campbells, Cream Of",Snapple Lemon Tea
77352,Bandage - Flexible Neon,Mangoes,"Sole - Dover, Whole, Fresh","Chicken - Leg, Boneless",Wiberg Super Cure
40094,"Turkey - Whole, Fresh",Phyllo Dough,Olives - Stuffed,Kellogs Special K Cereal,Rosemary - Dry
23548,Kellogs All Bran Bars,Wine - Ej Gallo Sierra Valley,Meldea Green Tea Liquor,Oil - Safflower,"Nut - Pistachio, Shelled"
78981,Mangoes,Wine - Vineland Estate Semi - Dry,Tomatoes Tear Drop,Apricots Fresh,Lettuce - Frisee


In [82]:
cityblock = pd.DataFrame(lcityblock)
cityblock.set_index('CustomerID', inplace = True)
cityblock.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,Pork - Kidney,"Cheese - Boursin, Garlic / Herbs",Ice Cream Bar - Hageen Daz To,Bay Leaf,Wine - Ej Gallo Sierra Valley
77352,Veal - Sweetbread,Soup - Campbells Tomato Ravioli,Tia Maria,Sausage - Liver,Juice - Lime
40094,Butter - Unsalted,Tia Maria,Sugar - Fine,Soup - Campbells Bean Medley,Cod - Black Whole Fillet
23548,Soup - Campbells Tomato Ravioli,Beef - Texas Style Burger,"Juice - Cranberry, 341 Ml",Veal - Sweetbread,Lamb - Ground
78981,Watercress,Veal - Sweetbread,Knife Plastic - White,Hersey Shakes,Cumin - Whole


In [83]:
correlation = pd.DataFrame(lcorrelation)
correlation.set_index('CustomerID', inplace = True)
correlation.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,Blueberries,Wine - Magnotta - Cab Sauv,"Wine - Red, Harrow Estates, Cab",Chips Potato Salt Vinegar 43g,"Lentils - Red, Dry"
77352,Juice - Apple Cider,Grenadine,Garbage Bags - Clear,Mangoes,Crab - Imitation Flakes
40094,"Bread - Roll, Soft White Round",Mayonnaise - Individual Pkg,Beef - Inside Round,Truffle Cups - Brown,Juice - V8 Splash
23548,Zucchini - Yellow,Wine - Crozes Hermitage E.,Wine - Ruffino Chianti,Olives - Stuffed,Crackers Cheez It
78981,Mussels - Frozen,Onion Powder,Bread - Raisin Walnut Oval,Sauerkraut,Garbage Bags - Clear


In [84]:
cosine = pd.DataFrame(lcosine)
cosine.set_index('CustomerID', inplace = True)
cosine.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,Foam Dinner Plate,"Cheese - Brie, Triple Creme",Ecolab - Mikroklene 4/4 L,Sauce - Hollandaise,"Yogurt - Blueberry, 175 Gr"
77352,Mangoes,Lettuce - Frisee,Zucchini - Yellow,Apricots Fresh,Juice - Apple Cider
40094,Truffle Cups - Brown,"Bread - Roll, Soft White Round",Juice - V8 Splash,Beef - Inside Round,Pants Custom Dry Clean
23548,Crackers Cheez It,Bandage - Flexible Neon,Cookies Cereal Nut,Beef - Top Sirloin - Aaa,Mayonnaise - Individual Pkg
78981,Cheese - Cottage Cheese,Baking Powder,Mussels - Frozen,Onion Powder,Bread - Raisin Walnut Oval


In [85]:
mahalanobis = pd.DataFrame(lmahalanobis)
mahalanobis.set_index('CustomerID', inplace = True)
mahalanobis.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,Smirnoff Green Apple Twist,Crush - Cream Soda,Salmon - Sockeye Raw,Cumin - Whole,Cod - Black Whole Fillet
77352,Smirnoff Green Apple Twist,Appetizer - Sausage Rolls,Watercress,Macaroons - Two Bite Choc,Spinach - Baby
40094,Pop Shoppe Cream Soda,Sausage - Breakfast,Ketchup - Tomato,Lettuce - Treviso,Sea Bass - Whole
23548,Soup - Campbells Bean Medley,Sage - Ground,Milk - 2%,Blueberries,Sardines
78981,"Veal - Inside, Choice",Ocean Spray - Ruby Red,Blueberries,Veal - Inside,Curry Paste - Madras


In [86]:
minkowski = pd.DataFrame(lminkowski)
minkowski.set_index('CustomerID', inplace = True)
minkowski.head(5)

Unnamed: 0_level_0,0,1,2,3,4
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
61288,"Mushrooms - Black, Dried","Wine - Magnotta, Merlot Sr Vqa",Chicken - Soup Base,Wine - Chardonnay South,Milk - 1%
77352,Guinea Fowl,Grenadine,"Oranges - Navel, 72",Ecolab - Mikroklene 4/4 L,"Shrimp - Baby, Warm Water"
40094,"Water - Mineral, Natural","Oregano - Dry, Rubbed",Pasta - Orecchiette,Quiche Assorted,Tuna - Salad Premix
23548,Wanton Wrap,Banana Turning,Flavouring - Orange,"Chocolate - Semi Sweet, Calets",Lettuce - Treviso
78981,Lettuce - Frisee,Longos - Chicken Wings,Pop Shoppe Cream Soda,Beef - Inside Round,Sprouts - Alfalfa
