This workbook implements Context Based Filtering for a Cats Recommendation System, working from the raw data all the way to model creation and initial results.

# Table of Contents

* [Load in Data and Segment Features from Context data](#segment)
* [Pre-process feature data](#pre-process)
    - [Make Needed Helper Functions and Imports](#pp_pipeline)
    - [Preprocess Data for model runs](#pp)
* [Run Content-Based-Filtering Modeling Iterations](#run_pipeline)
    - [Linear Similarity Results](#ls)
    - [Cosine similarity results](#cs)
    - [Laplacian Similarity Results](#lp)
    - [Overall Content Based Filtering results as of 1/2/2023](#ov)
* [Collaborative Filtering - Under Construction WIP](#cf)    
* [Conclusion and Next Steps](#conclusion)

# Load in Data and Segment Features from Context data<a id='segment'></a>

First, we load all of our adoptable cats.

In [1]:
from google.colab import drive
import joblib #so I can save files out

drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd
cats_DF = pd.read_csv("/content/drive/MyDrive/MLE10PetMatch/Adoptable_cats_20221125.csv",header=0,index_col=0)
cats_DF.shape

  exec(code_obj, self.user_global_ns, self.user_ns)


(49600, 50)

In [None]:
pd.set_option('display.max_columns', 500)
cats_DF.sample(3)

Unnamed: 0,id,organization_id,url,type,species,age,gender,size,coat,tags,name,description,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,breeds.primary,breeds.secondary,breeds.mixed,breeds.unknown,colors.primary,colors.secondary,colors.tertiary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,environment.children,environment.dogs,environment.cats,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full
14716,58924176,CO145,https://www.petfinder.com/cat/beau-58924176/co...,Cat,Cat,Baby,Male,Medium,Short,[],Beau,If you are interested in adopting a cat or kit...,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-21T02:52:45+0000,2022-11-21T02:52:40+0000,,Domestic Short Hair,,False,False,Gray / Blue / Silver,,,True,True,False,False,True,,,,,(303) 432-2299,6010 West 88th Avenue,,Westminster,CO,80031,US,58924176,cat,co145,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...
25635,58861227,NY1506,https://www.petfinder.com/cat/scooby-polk-5886...,Cat,Cat,Young,Male,Medium,,[],Scooby - POLK,This gorgeous seal coat kitten is ready to fin...,BECC-A-10396,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-15T15:09:59+0000,2022-11-15T15:09:59+0000,,Domestic Short Hair,,False,False,,Brown / Chocolate,,True,False,False,False,True,,,,fosterpolk@bestfriends.org,,,,Lakeland,FL,33801,US,58861227,cat,ny1506,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...
24282,58870076,OH317,https://www.petfinder.com/cat/horus-58870076/o...,Cat,Cat,Adult,Male,Medium,,[],Horus,DOB:3/2/21\n\nCurrently Horus can be found hid...,17822667-982091068161721,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-16T03:05:20+0000,2022-11-16T03:05:20+0000,,Domestic Short Hair,,True,False,,,,True,True,False,False,False,,,,adopt@colonycats.org,(614) 570-0471,2740 Festival Ln,,Dublin,OH,43017,US,58870076,cat,oh317,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...


In [3]:
cats_DF.columns

Index(['id', 'organization_id', 'url', 'type', 'species', 'age', 'gender',
       'size', 'coat', 'tags', 'name', 'description', 'organization_animal_id',
       'photos', 'primary_photo_cropped', 'videos', 'status',
       'status_changed_at', 'published_at', 'distance', 'breeds.primary',
       'breeds.secondary', 'breeds.mixed', 'breeds.unknown', 'colors.primary',
       'colors.secondary', 'colors.tertiary', 'attributes.spayed_neutered',
       'attributes.house_trained', 'attributes.declawed',
       'attributes.special_needs', 'attributes.shots_current',
       'environment.children', 'environment.dogs', 'environment.cats',
       'contact.email', 'contact.phone', 'contact.address.address1',
       'contact.address.address2', 'contact.address.city',
       'contact.address.state', 'contact.address.postcode',
       'contact.address.country', 'animal_id', 'animal_type',
       'organization_id.1', 'primary_photo_cropped.small',
       'primary_photo_cropped.medium', 'primary_photo

Drop animals with no pictures since they are key to our 'tinder-like' app experience.

In [4]:
cats_DF = cats_DF.dropna(subset=['primary_photo_cropped.full'])# drop rows with 0 pictures
cats_DF.shape # matches na count via sweet viz for cats

(46805, 50)

Next we seperate the dataframe into features to model over and context data that can be shown to the user for any matches. 'ID' will be our shared key between the two tables.

Of note, the 'distance' field and 'primary_photo_cropped.full' field will be useful data for future model enhancements. For the models so far, we will simply use textual data and assume a 0 distance for all pets.

In [5]:
contextCols = ['id','organization_id','url','type','tags','name','description','organization_animal_id',
              'photos','primary_photo_cropped','videos','status','status_changed_at','published_at',
              'distance','contact.email', 'contact.phone', 'contact.address.address1',
               'contact.address.address2', 'contact.address.city','contact.address.state', 
               'contact.address.postcode','contact.address.country', 'animal_id', 'animal_type',
               'organization_id.1', 'primary_photo_cropped.small','primary_photo_cropped.medium',
               'primary_photo_cropped.large','primary_photo_cropped.full']
featureCols = ['id','age','gender','size','coat','breeds.primary', 'breeds.secondary','breeds.mixed',
              'breeds.unknown','colors.primary','colors.secondary','colors.tertiary',
              'attributes.spayed_neutered','attributes.house_trained','attributes.declawed',
              'attributes.special_needs','attributes.shots_current','environment.children',
              'environment.dogs','environment.cats','type','contact.address.postcode'] # initial columns to keep for training purposes
cats_DF_features = cats_DF[featureCols]
cats_DF_context = cats_DF[contextCols]
cats_DF_features.shape

(46805, 22)

Let's sanity check our missing values now that we just have cats and remove any columns with too many missing values.

In [6]:
valueCounts = cats_DF_features.set_index('type').isna().groupby(level=0).sum()/cats_DF_features.shape[0] # level=0 refers to our index, which we made 'type'


In [7]:
pd.set_option('display.max_columns', 500)
valueCounts 

Unnamed: 0_level_0,id,age,gender,size,coat,breeds.primary,breeds.secondary,breeds.mixed,breeds.unknown,colors.primary,colors.secondary,colors.tertiary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,environment.children,environment.dogs,environment.cats,contact.address.postcode
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Cat,0.0,0.0,0.0,0.0,0.616857,0.0,0.899925,0.0,0.0,0.393548,0.746523,0.916035,0.0,0.0,0.0,0.0,0.0,0.737015,0.829954,0.588612,2.1e-05


In [8]:
valueCounts = cats_DF_context.set_index('type').isna().groupby(level=0).sum()/cats_DF_context.shape[0] # level=0 refers to our index, which we made 'type'


In [9]:
pd.set_option('display.max_columns', 500)
valueCounts 

Unnamed: 0_level_0,id,organization_id,url,tags,name,description,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1
Cat,0.0,0.0,0.0,0.0,2.1e-05,0.262365,0.313022,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.051405,0.193804,0.371499,0.923534,0.0,0.0,2.1e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


After a quick NA check, we will have to remove 'coat','breeds.secondary','colors.secondary','colors.tertiary'. The column 'colors.primary' is also missing a lot of values but for sake of differing one cat from another it will be kept for now. Additionally, we will bring back in address postcode as an initial attempt to match nearby cats together. Lastly, 'environment.children','environment.dogs', and 'environment.cats' have a lot of missing values but users derive a lot of value from this information. Therefore, they will be kept as well.

In [10]:
featureCols = ['id','age','gender','size','breeds.primary','breeds.mixed','breeds.unknown',
               'colors.primary','attributes.spayed_neutered','attributes.house_trained',
               'attributes.declawed','attributes.special_needs','attributes.shots_current',
               'contact.address.postcode','environment.children','environment.dogs','environment.cats']
cats_DF_features = cats_DF[featureCols]
cats_DF_context = cats_DF[contextCols]
cats_DF_features.shape

(46805, 17)

In [11]:
cats_DF_features.dtypes

id                             int64
age                           object
gender                        object
size                          object
breeds.primary                object
breeds.mixed                    bool
breeds.unknown                  bool
colors.primary                object
attributes.spayed_neutered      bool
attributes.house_trained        bool
attributes.declawed             bool
attributes.special_needs        bool
attributes.shots_current        bool
contact.address.postcode      object
environment.children          object
environment.dogs              object
environment.cats              object
dtype: object

# Pre-process feature data<a id='pre-process'></a>

## Make Needed Helper Functions and Imports <a id='pp_pipeline'></a>

Make needed helper functions for modeling later on in workbook.

In [12]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel 
from sklearn.metrics.pairwise import laplacian_kernel
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.decomposition import TruncatedSVD

In [13]:
def remove_columns_with_1_distinct(df):
    drop_col = [e for e in df.columns if df[e].nunique()==1]
    df_return = df.drop(drop_col,axis=1)
    return df_return


In [14]:
def drop_duplicates(df):
    df_return = df.drop_duplicates()
    return df_return


In [15]:
def linear_similarities(df_1,id_df):
    cs_simil = linear_kernel(df_1,df_1)
    results = {}
    ds = id_df # needs id column
    for idx, row in ds.iterrows():
       similar_indices = cs_simil[idx].argsort()[:-100:-1] 
       similar_items = [(cs_simil[idx][i], ds['id'][i]) for i in similar_indices] 
       results[row['id']] = similar_items[1:]
    return results

In [16]:
def cosine_similarities(df_1,id_df):
    cs_simil = cosine_similarity(df_1,df_1)
    results = {}
    ds = id_df # needs id column
    for idx, row in ds.iterrows():
       similar_indices = cs_simil[idx].argsort()[:-100:-1] 
       similar_items = [(cs_simil[idx][i], ds['id'][i]) for i in similar_indices] 
       results[row['id']] = similar_items[1:]
    return results

In [17]:
def laplacian_similarities(df_1,id_df):
    cs_simil = laplacian_kernel(df_1,df_1)
    results = {}
    ds = id_df # needs id column
    for idx, row in ds.iterrows():
       similar_indices = cs_simil[idx].argsort()[:-100:-1] 
       similar_items = [(cs_simil[idx][i], ds['id'][i]) for i in similar_indices] 
       results[row['id']] = similar_items[1:]
    return results

In [18]:
def item(id,df):  
    ds = df
    colsGrab = ['id']
    return ds.loc[ds['id'] == id][colsGrab].values[0]# Just reads the results out of the dictionary.

def url(id,df):  
    ds = df
    colsGrab = ['url']
    return ds.loc[ds['id'] == id][colsGrab].values[0]# Just reads the results out of the dictionary.

def picture(id,df):  
    ds = df
    colsGrab = ['primary_photo_cropped.full']
    return ds.loc[ds['id'] == id][colsGrab].values[0]# Just reads the results out of the dictionary.

def recommend(item_id, num,df,reccs):
    print("Recommending " + str(num) + " cats similar to " + str(item(item_id,df)) + "... " 
          + picture(item_id,df) + " - " + url(item_id,df))   
    print("-------")    
    recs = reccs[item_id][:num]   
    for rec in recs: 
        print("Recommended: " + str(item(rec[1],df)) + " (score:" +      str(rec[0]) + ") " 
              + picture(rec[1],df) + " - " + url(rec[1],df))
    
def score(reccs, num):
    print("Finding average reccomendation score for top 5 reccomendations per example")
    results = []
    for key in reccs.keys():
        subRecs = reccs[key][:num]
        for r in subRecs:
            results.append(r[0])
    averageRecc = sum(results) / len(results)
    print("There are "+ str(len(results)) + 'results with a sum of' + str(sum(results)) + 'and and average of: ' 
          + str(averageRecc) )
    return averageRecc

## Preprocess Data for model runs <a id='pp'></a>

Now that essential methods are defined, lets handle the data.

In [19]:
cats_DF_features.head(3) #sneak peak of what we have to work with initially

Unnamed: 0,id,age,gender,size,breeds.primary,breeds.mixed,breeds.unknown,colors.primary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,contact.address.postcode,environment.children,environment.dogs,environment.cats
1,58980784,Baby,Male,Medium,Tuxedo,False,False,Black & White / Tuxedo,True,True,False,False,True,37343,True,,True
13,58980778,Baby,Male,Medium,Domestic Short Hair,False,False,Black,True,True,False,False,True,92057,True,,True
14,58980506,Young,Female,Medium,Domestic Short Hair,False,False,Torbie,True,True,False,False,True,50126,,,True


Besides the id value, which is our shared key, all other fields are categorical. We can use One-Hot encoding to transform them into something more efficient to run models over. 

Some preprocessing before One-Hot Encoding must occur to ensure everything goes as planned. First, we proactively drop duplicate rows. Second, we remove any features with only 1 distince value, since content-based filtering uses differences between objects and if everyone is the same there is no new information. Third, we replace NaNs with a special string so that One-Hot Encoding can work. Lastly, we fix the postcode to a string so that One-Hot Encoding works properly.

In [20]:
# Preprocess data before encoding occurs for some troublesome fields
X = cats_DF_features
X = drop_duplicates(X) # remove duplicate rows
X = remove_columns_with_1_distinct(X) # remove any features with only 1 distinct value
X["contact.address.postcode"]= X["contact.address.postcode"].astype(str) # fix postcode to be a str rather than an int
# One-Hot Encoder requires all strings or all ints, so bools are not strings
X['breeds.mixed'] = X['breeds.mixed'].map({True: 'True', False: 'False'}) 
X['attributes.spayed_neutered'] = X['attributes.spayed_neutered'].map({True: 'True', False: 'False'}) 
X['attributes.house_trained'] = X['attributes.house_trained'].map({True: 'True', False: 'False'}) 
X['attributes.declawed'] = X['attributes.declawed'].map({True: 'True', False: 'False'}) 
X['attributes.special_needs'] = X['attributes.special_needs'].map({True: 'True', False: 'False'}) 
X['attributes.shots_current'] = X['attributes.shots_current'].map({True: 'True', False: 'False'}) 
X['environment.children'] = X['environment.children'].map({True: 'True', False: 'False'}) 
X['environment.dogs'] = X['environment.dogs'].map({True: 'True', False: 'False'}) 
X['environment.cats'] = X['environment.cats'].map({True: 'True', False: 'False'}) 
X = X.replace(np.nan,'Not Available') # replace nan's with their own special category, do this last once types all fixed!
X.dtypes
#target = 'todo' # will be rankings once we have them
#X, y = cats_DF_features.drop(columns=target), cats_DF_features[target]

id                             int64
age                           object
gender                        object
size                          object
breeds.primary                object
breeds.mixed                  object
colors.primary                object
attributes.spayed_neutered    object
attributes.house_trained      object
attributes.declawed           object
attributes.special_needs      object
attributes.shots_current      object
contact.address.postcode      object
environment.children          object
environment.dogs              object
environment.cats              object
dtype: object

In [21]:
X_transform = X

In [22]:
X_transform.shape

(46710, 16)

In [23]:
X_transform.head(3)

Unnamed: 0,id,age,gender,size,breeds.primary,breeds.mixed,colors.primary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,contact.address.postcode,environment.children,environment.dogs,environment.cats
1,58980784,Baby,Male,Medium,Tuxedo,False,Black & White / Tuxedo,True,True,False,False,True,37343,True,Not Available,True
13,58980778,Baby,Male,Medium,Domestic Short Hair,False,Black,True,True,False,False,True,92057,True,Not Available,True
14,58980506,Young,Female,Medium,Domestic Short Hair,False,Torbie,True,True,False,False,True,50126,Not Available,Not Available,True


Now we make our train, dev, and test sets. Content-Based Filtering does not use the dev or test set but it is also very RAM hungry so it can't use the full data set. Also, it is good practice to keep the train set the train set. All models will therefore train on the train set.

In [24]:
# split data
x_t, x_test = train_test_split(X_transform,test_size=0.1,train_size=0.9, random_state=13)
x_train, x_dev = train_test_split(x_t,test_size = 0.1,train_size =0.9, random_state=13)

In [25]:
x_train_index = x_train.index
x_train.shape

(37835, 16)

In [26]:
x_dev_index = x_dev.index
x_dev.shape

(4204, 16)

In [27]:
x_test_index = x_test.index
x_test.shape

(4671, 16)

In [43]:
x_train.sample(3)

Unnamed: 0,id,age,gender,size,breeds.primary,breeds.mixed,colors.primary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,contact.address.postcode,environment.children,environment.dogs,environment.cats
14898,58934352,Baby,Male,Medium,Bombay,True,Black,True,True,False,False,True,36555,True,Not Available,True
12097,58759594,Adult,Female,Large,Domestic Short Hair,False,Not Available,True,False,False,False,False,30040,Not Available,Not Available,Not Available
13913,58706766,Baby,Female,Medium,Dilute Calico,False,Dilute Calico,True,True,False,False,True,78704,Not Available,Not Available,Not Available


In [29]:
train_context = cats_DF_context.loc[x_train_index]
train_context.shape

(37835, 30)

In [42]:
train_context.head(3)

Unnamed: 0,id,organization_id,url,type,tags,name,description,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full
6911,58957700,PA1153,https://www.petfinder.com/cat/bobcat-58957700/...,Cat,[],Bobcat,Bob was found on the street as a baby and unfo...,18695629-KS00907,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-24T16:35:52+0000,2022-11-24T16:35:50+0000,,cats@kittensnatchers.org,,,,Philadelphia,PA,19148,US,58957700,cat,pa1153,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...
12255,58935305,TN676,https://www.petfinder.com/cat/john-b-kitten-58...,Cat,[],John B kitten,"Super friendly, raised in rescue with siblings...",18762550,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-22T08:13:29+0000,2022-11-22T08:13:27+0000,,HPKRrescue@gmail.com,(865) 765-3400,,,Knoxville,TN,37917,US,58935305,cat,tn676,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...
38273,58769865,WI507,https://www.petfinder.com/cat/tapioca-58769865...,Cat,[],Tapioca,Tapioca (aka Puddin&amp;#39;) is a very friend...,37222,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-05T21:27:45+0000,2022-11-05T21:27:43+0000,,adoptions@scadopt.org,(920) 458-2012,3209 N 21st Street,,Sheboygan,WI,53083,US,58769865,cat,wi507,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...


The entries match! We need to pass to our models the numerical data to analyze similarity of products and the context data that goes along with it. As long as the indexes are the same, we can stitch them back together.

 Now let's do it for dev and test sets too!

In [31]:
dev_context = cats_DF_context.loc[x_dev_index]
dev_context.shape

(4204, 30)

In [32]:
test_context = cats_DF_context.loc[x_test_index]
test_context.shape

(4671, 30)

In [33]:
cats_DF_context.head(3)

Unnamed: 0,id,organization_id,url,type,tags,name,description,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full
1,58980784,TN589,https://www.petfinder.com/cat/zorro-58980784/t...,Cat,"['Friendly', 'Affectionate', 'Playful', 'Funny...",Zorro,Zorro is very sweet and enjoys being in your l...,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-28T02:29:46+0000,2022-11-28T02:29:45+0000,,bulldog50@epbfi.com,,,,Hixson,TN,37343,US,58980784,cat,tn589,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...
13,58980778,CA2825,https://www.petfinder.com/cat/sammy-58980778/c...,Cat,"['Friendly', 'Playful', 'Loves kisses', 'Athle...",Sammy,"“Sammy” is a tiny black, male kitten. About 10...",,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-11-28T02:29:10+0000,2022-11-28T02:29:09+0000,,info@sunriserescue.com,,,,Oceanside,CA,92057,US,58980778,cat,ca2825,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...
14,58980506,IA16,https://www.petfinder.com/cat/girly-58980506/i...,Cat,"['Friendly', 'Gentle', 'Dignified']",Girly,Girly is a dainty lady! She enjoys getting pet...,7.0,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,"[{'embed': '<iframe title=""Video"" frameborder=...",adoptable,2022-11-28T02:28:45+0000,2022-11-28T02:28:44+0000,,greenbelthumane@hotmail.com,(641) 648-2692,319 River St.,,Iowa Falls,IA,50126,US,58980506,cat,ia16,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...


Since we know the indexs match, lets get rid of the id columns.

In [34]:
x_train = x_train.reset_index(drop=True) # required so keys work properly
x_train_woID = x_train.drop(columns='id')
x_train_woID.dtypes

age                           object
gender                        object
size                          object
breeds.primary                object
breeds.mixed                  object
colors.primary                object
attributes.spayed_neutered    object
attributes.house_trained      object
attributes.declawed           object
attributes.special_needs      object
attributes.shots_current      object
contact.address.postcode      object
environment.children          object
environment.dogs              object
environment.cats              object
dtype: object

In [35]:
x_dev_woID = x_dev.drop(columns='id')
x_dev_woID.dtypes

age                           object
gender                        object
size                          object
breeds.primary                object
breeds.mixed                  object
colors.primary                object
attributes.spayed_neutered    object
attributes.house_trained      object
attributes.declawed           object
attributes.special_needs      object
attributes.shots_current      object
contact.address.postcode      object
environment.children          object
environment.dogs              object
environment.cats              object
dtype: object

In [36]:
x_test_woID = x_test.drop(columns='id')
x_test_woID.dtypes

age                           object
gender                        object
size                          object
breeds.primary                object
breeds.mixed                  object
colors.primary                object
attributes.spayed_neutered    object
attributes.house_trained      object
attributes.declawed           object
attributes.special_needs      object
attributes.shots_current      object
contact.address.postcode      object
environment.children          object
environment.dogs              object
environment.cats              object
dtype: object

In [37]:
X_transform_woID = X_transform.drop(columns='id')

Notice that indexes are the same and id columns are gone, so we can recover the IDs later! Now we can apply One-Hot Encoding!

In [38]:
ohe = OneHotEncoder().fit(X_transform_woID) # One Hot Encoding WAAAY better, fit on whole X
X_train_transform = ohe.transform(x_train_woID) # don't need to add id columns because same columns preserved
X_dev_transform  = ohe.transform(x_dev_woID)
X_test_transform = ohe.transform(x_test_woID)

# Run Content-Based-Filtering Modeling Iterations <a id='run_pipeline'></a>

Content-Based Filtering is a method of comparing products against each other when you don't have user rankings. This can be a simple way to create models before user ranking data is available and can often do well in recommending similar products. In our case, products are cats. Let's explore a few options for Content-Based Filtering and see how they do.

## Linear Similarity Results <a id='ls'></a>

In [None]:
#Linear_Model =linear_similarities(X_train_transform,x_train) #run similarities with linear kernel


In [None]:
#joblib.dump(Linear_Model, '/content/drive/MyDrive/MLE10PetMatch/models/linear_similarity_model.pkl')

['/content/drive/MyDrive/MLE10PetMatch/models/linear_similarity_model.pkl']

In [None]:
Linear_Model = joblib.load('/content/drive/MyDrive/MLE10PetMatch/models/linear_similarity_model_catsv1.pkl')

In [None]:
pd.options.display.max_colwidth = 100
recommend(item_id=58806733, num=5,df=train_context,reccs=Linear_Model)

['Recommending 5 cats similar to [58806733]... https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58806733/1/?bust=1668025870 - https://www.petfinder.com/cat/palomino-58806733/nm/las-cruces/cats-meow-adoption-center-nm198/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
-------
['Recommended: [58670965] (score:12.0) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58670965/4/?bust=1669340336 - https://www.petfinder.com/cat/jacob-and-wilhelm-58670965/sk/meadow-lake/meadow-lake-and-district-humane-society-sk12/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58972161] (score:12.0) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58972161/1/?bust=1669500200 - https://www.petfinder.com/cat/mango-58972161/nm/las-cruces/cats-meow-adoption-center-nm198/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58807305] (score:12.0) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58807305/1/?bust=1668028273 - https://www.petfinder.com/cat/keno-58807305/nm/las-cru

The above is score for one item only so now let's get an idea of how well this does for the entire training set.

In [None]:
# Gather average score of top 5 recommendations for training set, with a max score of 15!
linearScore = score(reccs=Linear_Model, num=5)
linearScore

Finding average reccomendation score for top 5 reccomendations per example
There are 189175results with a sum of2604239.0and and average of: 13.766295757896128


13.766295757896128

The overall score for the whole training set for Linear Kernel is 13.77/15 or .913

## Cosine similarity results <a id='cs'></a>

In [None]:
#Cosine_Model =cosine_similarities(X_train_transform,x_train) #run similarities with cosine similarity


In [None]:
#joblib.dump(Cosine_Model, '/content/drive/MyDrive/MLE10PetMatch/models/cosine_similarity_model_catsv1.pkl')

['/content/drive/MyDrive/MLE10PetMatch/models/cosine_similarity_model_catsv1.pkl']

In [39]:
Cosine_Model = joblib.load('/content/drive/MyDrive/MLE10PetMatch/models/cosine_similarity_model_catsv1.pkl')

In [46]:
pd.options.display.max_colwidth = 100
recommend(item_id=58706766, num=5,df=train_context,reccs=Cosine_Model)

['Recommending 5 cats similar to [58706766]... https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58706766/2/?bust=1667159800 - https://www.petfinder.com/cat/pjs-58706766/tx/austin/new-hope-animal-rescue-nfp-tx2339/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
-------
['Recommended: [58706847] (score:0.9333333333333331) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58706847/2/?bust=1667160445 - https://www.petfinder.com/cat/sleepy-spice-58706847/tx/austin/new-hope-animal-rescue-nfp-tx2339/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58926364] (score:0.9333333333333331) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58926364/1/?bust=1669040782 - https://www.petfinder.com/cat/squidlet-58926364/va/alexandria/tails-high-inc-va540/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58926359] (score:0.9333333333333331) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58926359/1/?bust=1669040670 - https://www.petfinder.com/cat/sushi-58926359/va/

The above is score for one item only so now let's get an idea of how well this does for the entire training set.

In [None]:
# Gather average score of top 5 recommendations for training set, with a max score of 1!
cosineScore = score(reccs=Cosine_Model, num=5)
cosineScore

Finding average reccomendation score for top 5 reccomendations per example
There are 189175results with a sum of173615.9333331211and and average of: 0.9177530505252867


0.9177530505252867

The overall score for the whole training set for Cosine Similarity is .918

## Laplacian Similarity Results <a id='lp'></a>

In [None]:
#lp_Model =laplacian_similarities(X_train_transform,x_train) #run similarities with polynomial similarity

In [None]:
#joblib.dump(lp_Model, '/content/drive/MyDrive/MLE10PetMatch/models/laplace_similarity_model_catsv1.pkl')

['/content/drive/MyDrive/MLE10PetMatch/models/laplace_similarity_model_catsv1.pkl']

In [40]:
lp_Model = joblib.load('/content/drive/MyDrive/MLE10PetMatch/models/laplace_similarity_model_catsv1.pkl')

In [47]:
pd.options.display.max_colwidth = 100
recommend(item_id=58706766, num=5,df=train_context,reccs=lp_Model)

['Recommending 5 cats similar to [58706766]... https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58706766/2/?bust=1667159800 - https://www.petfinder.com/cat/pjs-58706766/tx/austin/new-hope-animal-rescue-nfp-tx2339/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
-------
['Recommended: [58706847] (score:0.9995024875724535) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58706847/2/?bust=1667160445 - https://www.petfinder.com/cat/sleepy-spice-58706847/tx/austin/new-hope-animal-rescue-nfp-tx2339/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58926364] (score:0.9995024875724535) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58926364/1/?bust=1669040782 - https://www.petfinder.com/cat/squidlet-58926364/va/alexandria/tails-high-inc-va540/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58926359] (score:0.9995024875724535) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58926359/1/?bust=1669040670 - https://www.petfinder.com/cat/sushi-58926359/va/

The above is score for one item only so now let's get an idea of how well this does for the entire training set.

In [None]:
# Gather average score of top 5 recommendations for training set, with a max score of 1!
lpScore = score(reccs=lp_Model, num=5)
lpScore

Finding average reccomendation score for top 5 reccomendations per example
There are 189175results with a sum of189058.90931529008and and average of: 0.9993863317842742


0.9993863317842742

The overall score for the whole training set for Laplacian Similarity is .999.

## Overall Content Based Filtering results as of 1/2/2023 <a id='ov'></a>

In [None]:
from tabulate import tabulate
table = [['Model Name', 'Score'],
         ['Linear Kernel',linearScore],
         ['Cosine Similarity',cosineScore],
         ['Laplacian Kernel',lpScore]]
print(tabulate(table,headers='firstrow',tablefmt='fancy_grid'))

╒═══════════════════╤═══════════╕
│ Model Name        │     Score │
╞═══════════════════╪═══════════╡
│ Linear Kernel     │ 13.7663   │
├───────────────────┼───────────┤
│ Cosine Similarity │  0.917753 │
├───────────────────┼───────────┤
│ Laplacian Kernel  │  0.999386 │
╘═══════════════════╧═══════════╛


1. All three distance measures return the same values for our test instance for the top 5.
2.   The only difference is in how the numbers are calculated, which makes it a little misleading of choosing one over the other. That said, it appears that cosine similarity is the most sensitive to differences and is currently the prefered content-based similarity model.
3.   Additional model iterations will attempt to tease these content-based filtering results further but for now they all seem to work as intended. The cosine similarity and laplacian kernel are nice because they have a finite output score between 0 and 1.

# Collaborative Filtering - Under Construction WIP <a id='cf'></a>

Collobrative Filtering uses rankings to reccommend new products to customers and have several approaches one can take. For this first iteration, we will use a model-based SVD (Matrix Factorization) approach on user-item interactions. 

## Upload and Prep the Data

In [None]:
import pandas as pd
cat_rankings = pd.read_csv("/content/drive/MyDrive/MLE10PetMatch/petmatch_rankings_cats.csv",header=0,index_col=0)
cat_rankings.shape

(194, 2)

In [None]:
'''
rating dataframe will look like this
| user_id | item_id | rating          |
|---------|---------|-----------------|
| 1       | 1       | 5               |
| ...     | ...     | ...             |
| n       | m       | 3               |
'''
cat_rankings.head(3)

Unnamed: 0_level_0,cat_id,preference
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Denise,58935988,0
Denise,58708840,1
Denise,58969335,0


First, lets make a train-test split

In [None]:
# before we split, user ranking counts
cat_rankings.groupby('user_name').count()

Unnamed: 0_level_0,cat_id,preference
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1
1,8,8
3,62,62
Denise,32,32
Matt,92,92


In [None]:
cf_train, cf_test = train_test_split(cat_rankings,test_size=0.2,train_size=0.8, random_state=12)


In [None]:
cf_test.groupby('user_name').count()

Unnamed: 0_level_0,cat_id,preference
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3,3
3,12,12
Denise,9,9
Matt,15,15


In [None]:
cf_train.groupby('user_name').count()

Unnamed: 0_level_0,cat_id,preference
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,5
3,50,50
Denise,23,23
Matt,77,77


This train-test mix seems to generally keep a 80-20% balance among users. This should work.

As we can see, we need to fix our data first to match the correct format.

In [None]:
user_item_mat_train = pd.DataFrame()
user_item_mat_train['user']= pd.unique(cat_rankings.index) # add unique users first
user_item_mat_train[[cats_DF['id']]] = 0 # assume dislike if no data
user_item_mat_train.head(3)

  self[col] = value


Unnamed: 0,user,58980784,58980778,58980506,58980757,58980756,58980742,58980740,58980734,58980714,58980708,58980707,58980705,58980697,58980698,58980694,58980695,58980688,58980681,58980682,58980680,58980679,58980686,58980687,58980685,58980683,58980678,58980675,58980673,58980665,58980664,58980661,58980662,58980652,58980647,58980650,58980649,58980634,58980606,58980601,58980589,58980583,58980568,58980567,58980554,58980548,58980549,58928844,58980534,58980529,58980392,58980526,58980514,58980510,58980511,58980509,58980489,58980486,58980485,58980478,58980476,58980477,58980474,58980471,58980457,58980453,58980452,58980451,58980446,58980442,58980436,58980426,58980424,58980422,58980412,58980399,58980398,58980397,58980393,58980390,58980391,58980384,58980385,58980383,58980382,58980379,58980372,58980370,58980368,58980361,58980360,58980357,58980353,58980350,58980349,58980348,58980340,58980321,58980322,58980320,58980311,58980310,58980309,58980308,58980304,58980275,58980276,58980277,58980278,58980273,58980271,58980272,58980274,58980269,58980265,58980264,58980263,58980262,58980259,58980249,58980248,58980254,58980255,58980256,58980252,58980250,58980251,58980241,58980242,58980232,58980239,58980227,58980234,58980237,58980220,58980219,58980226,58980231,58980236,58980221,58980224,58980222,58980223,58980229,58980214,58980212,58980206,58980205,58980201,58980199,58980198,58980188,58980190,58980189,58980191,58980193,58980192,58980194,58980187,58980181,58980183,58980182,58980180,58980166,58980164,58980161,58980149,58980148,58980145,58980128,58980121,58980125,58980122,58972567,58980120,58972582,58980115,58980113,58980114,58980110,58980112,58980107,58980101,58980098,58980095,58980093,58980091,58980082,58980081,58980071,58980069,58980068,58980059,58980058,58980056,58980055,58980054,58980043,58980052,58980051,58980047,58980050,58980049,58980048,58980046,58980044,58980045,58980041,58980039,58980038,58980037,58980034,58980031,58980030,58980029,58980025,58980023,58980020,58980021,58980018,58980017,58980008,58979983,58980005,58980000,58979999,58979991,58979992,58979989,58979988,58979986,58979985,58979984,58979977,58979980,58979979,58979978,58978013,58979963,58979962,58979961,58979960,58979959,58979958,58979948,58979947,58979946,58979764,58979938,58979932,...,58660713,58660712,58660700,58660693,58660701,58660689,58660687,58660691,58660685,58660690,58660672,58660637,58660648,58660646,58660624,58660540,58660522,58660433,58660432,58660428,58660386,58660272,58660267,58660262,58660261,58660260,58660250,58660251,58660253,58660201,58660200,58660172,58660192,58660179,58660186,58660193,58660177,58660175,58660191,58660176,58660190,58660110,58660095,58660063,58660060,58660057,58659979,58659960,58659945,58659937,58659846,58659905,58659899,58659880,58659870,58659867,58659865,58659858,58659839,58659831,58659817,58659816,58659804,58659822,58659784,58659743,58659695,58659692,58659658,58659659,58659660,58659655,58659630,58659551,58659528,58658597,58659433,58659399,58659394,58659300,58659221,58659254,58659161,58659136,58659089,58659092,58659098,58659083,58659099,58659085,58659090,58659081,58659086,58659094,58659061,58658978,58658919,58658777,58658743,58658666,58658655,58658616,58658571,58658559,58658549,58658550,58658547,58658490,58658507,58658510,58658463,58658457,58658431,58658424,58658423,58658416,58658419,58658410,58658407,58658392,58658404,58658395,58658403,58658398,58658333,58658336,58658324,58658307,58658266,58658257,58658254,58658163,58658161,58658152,58658086,58657915,58657844,58657836,58657840,58657732,58657728,58657725,58657704,58657673,58657636,58657597,58657550,58657543,58657542,58657540,58657510,58657377,58657343,58657289,58657276,58657269,58657238,58657217,58657187,58657184,58657179,58657176,58657175,58657173,58657130,58657123,58657126,58657103,58657101,58657085,58657075,58657057,58657054,58657055,58657051,58657046,58657035,58657032,58657026,58656993,58656985,58656976,58656955,58656949,58656938,58656877,58656860,58656855,58656845,58656838,58656837,58656832,58656822,58656802,58656790,58656744,58656736,58656730,58656729,58656726,58656711,58656703,58656701,58656673,58656650,58656634,58656635,58656631,58656628,58656611,58656608,58656584,58656519,58656465,58656492,58656493,58656491,58656489,58656490,58656488,58656486,58656477,58656475,58656472,58656462,58656452,58656448,58656445,58656430,58656399,58656401,58656400,58656357,58656356,58656344,58656306,58656301,58656292,58656289,58656257,58656256,58656246,58656153,58656144,58656131,58656130,58656109,58656099,58656082,58656004
0,Denise,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Matt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
user_item_mat_test = user_item_mat_train.copy() # make blank template for test as well

In [None]:
user_item_mat_test.head(3)

Unnamed: 0,user,58980784,58980778,58980506,58980757,58980756,58980742,58980740,58980734,58980714,58980708,58980707,58980705,58980697,58980698,58980694,58980695,58980688,58980681,58980682,58980680,58980679,58980686,58980687,58980685,58980683,58980678,58980675,58980673,58980665,58980664,58980661,58980662,58980652,58980647,58980650,58980649,58980634,58980606,58980601,58980589,58980583,58980568,58980567,58980554,58980548,58980549,58928844,58980534,58980529,58980392,58980526,58980514,58980510,58980511,58980509,58980489,58980486,58980485,58980478,58980476,58980477,58980474,58980471,58980457,58980453,58980452,58980451,58980446,58980442,58980436,58980426,58980424,58980422,58980412,58980399,58980398,58980397,58980393,58980390,58980391,58980384,58980385,58980383,58980382,58980379,58980372,58980370,58980368,58980361,58980360,58980357,58980353,58980350,58980349,58980348,58980340,58980321,58980322,58980320,58980311,58980310,58980309,58980308,58980304,58980275,58980276,58980277,58980278,58980273,58980271,58980272,58980274,58980269,58980265,58980264,58980263,58980262,58980259,58980249,58980248,58980254,58980255,58980256,58980252,58980250,58980251,58980241,58980242,58980232,58980239,58980227,58980234,58980237,58980220,58980219,58980226,58980231,58980236,58980221,58980224,58980222,58980223,58980229,58980214,58980212,58980206,58980205,58980201,58980199,58980198,58980188,58980190,58980189,58980191,58980193,58980192,58980194,58980187,58980181,58980183,58980182,58980180,58980166,58980164,58980161,58980149,58980148,58980145,58980128,58980121,58980125,58980122,58972567,58980120,58972582,58980115,58980113,58980114,58980110,58980112,58980107,58980101,58980098,58980095,58980093,58980091,58980082,58980081,58980071,58980069,58980068,58980059,58980058,58980056,58980055,58980054,58980043,58980052,58980051,58980047,58980050,58980049,58980048,58980046,58980044,58980045,58980041,58980039,58980038,58980037,58980034,58980031,58980030,58980029,58980025,58980023,58980020,58980021,58980018,58980017,58980008,58979983,58980005,58980000,58979999,58979991,58979992,58979989,58979988,58979986,58979985,58979984,58979977,58979980,58979979,58979978,58978013,58979963,58979962,58979961,58979960,58979959,58979958,58979948,58979947,58979946,58979764,58979938,58979932,...,58660713,58660712,58660700,58660693,58660701,58660689,58660687,58660691,58660685,58660690,58660672,58660637,58660648,58660646,58660624,58660540,58660522,58660433,58660432,58660428,58660386,58660272,58660267,58660262,58660261,58660260,58660250,58660251,58660253,58660201,58660200,58660172,58660192,58660179,58660186,58660193,58660177,58660175,58660191,58660176,58660190,58660110,58660095,58660063,58660060,58660057,58659979,58659960,58659945,58659937,58659846,58659905,58659899,58659880,58659870,58659867,58659865,58659858,58659839,58659831,58659817,58659816,58659804,58659822,58659784,58659743,58659695,58659692,58659658,58659659,58659660,58659655,58659630,58659551,58659528,58658597,58659433,58659399,58659394,58659300,58659221,58659254,58659161,58659136,58659089,58659092,58659098,58659083,58659099,58659085,58659090,58659081,58659086,58659094,58659061,58658978,58658919,58658777,58658743,58658666,58658655,58658616,58658571,58658559,58658549,58658550,58658547,58658490,58658507,58658510,58658463,58658457,58658431,58658424,58658423,58658416,58658419,58658410,58658407,58658392,58658404,58658395,58658403,58658398,58658333,58658336,58658324,58658307,58658266,58658257,58658254,58658163,58658161,58658152,58658086,58657915,58657844,58657836,58657840,58657732,58657728,58657725,58657704,58657673,58657636,58657597,58657550,58657543,58657542,58657540,58657510,58657377,58657343,58657289,58657276,58657269,58657238,58657217,58657187,58657184,58657179,58657176,58657175,58657173,58657130,58657123,58657126,58657103,58657101,58657085,58657075,58657057,58657054,58657055,58657051,58657046,58657035,58657032,58657026,58656993,58656985,58656976,58656955,58656949,58656938,58656877,58656860,58656855,58656845,58656838,58656837,58656832,58656822,58656802,58656790,58656744,58656736,58656730,58656729,58656726,58656711,58656703,58656701,58656673,58656650,58656634,58656635,58656631,58656628,58656611,58656608,58656584,58656519,58656465,58656492,58656493,58656491,58656489,58656490,58656488,58656486,58656477,58656475,58656472,58656462,58656452,58656448,58656445,58656430,58656399,58656401,58656400,58656357,58656356,58656344,58656306,58656301,58656292,58656289,58656257,58656256,58656246,58656153,58656144,58656131,58656130,58656109,58656099,58656082,58656004
0,Denise,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Matt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now we have a default tables, lets update it with our saved rankings so far!

In [None]:
for index, row in cf_train.iterrows(): #update train table
    #print(index) # shows names
    indextoChange = user_item_mat_train[user_item_mat_train['user']==index].index #our name to change
    columntoChange = row[0] # animal to update
    preferencetoChange = row[1] # ranking for animal to use
    user_item_mat_train.at[indextoChange,columntoChange] = preferencetoChange # update cell in dataframe

In [None]:
for index, row in cf_test.iterrows(): #update test table
    #print(index) # shows names
    indextoChange = user_item_mat_test[user_item_mat_test['user']==index].index #our name to change
    columntoChange = row[0] # animal to update
    preferencetoChange = row[1] # ranking for animal to use
    user_item_mat_test.at[indextoChange,columntoChange] = preferencetoChange # update cell in dataframe

In [None]:
user_item_mat_train.head(5)

Unnamed: 0,user,58980784,58980778,58980506,58980757,58980756,58980742,58980740,58980734,58980714,58980708,58980707,58980705,58980697,58980698,58980694,58980695,58980688,58980681,58980682,58980680,58980679,58980686,58980687,58980685,58980683,58980678,58980675,58980673,58980665,58980664,58980661,58980662,58980652,58980647,58980650,58980649,58980634,58980606,58980601,58980589,58980583,58980568,58980567,58980554,58980548,58980549,58928844,58980534,58980529,58980392,58980526,58980514,58980510,58980511,58980509,58980489,58980486,58980485,58980478,58980476,58980477,58980474,58980471,58980457,58980453,58980452,58980451,58980446,58980442,58980436,58980426,58980424,58980422,58980412,58980399,58980398,58980397,58980393,58980390,58980391,58980384,58980385,58980383,58980382,58980379,58980372,58980370,58980368,58980361,58980360,58980357,58980353,58980350,58980349,58980348,58980340,58980321,58980322,58980320,58980311,58980310,58980309,58980308,58980304,58980275,58980276,58980277,58980278,58980273,58980271,58980272,58980274,58980269,58980265,58980264,58980263,58980262,58980259,58980249,58980248,58980254,58980255,58980256,58980252,58980250,58980251,58980241,58980242,58980232,58980239,58980227,58980234,58980237,58980220,58980219,58980226,58980231,58980236,58980221,58980224,58980222,58980223,58980229,58980214,58980212,58980206,58980205,58980201,58980199,58980198,58980188,58980190,58980189,58980191,58980193,58980192,58980194,58980187,58980181,58980183,58980182,58980180,58980166,58980164,58980161,58980149,58980148,58980145,58980128,58980121,58980125,58980122,58972567,58980120,58972582,58980115,58980113,58980114,58980110,58980112,58980107,58980101,58980098,58980095,58980093,58980091,58980082,58980081,58980071,58980069,58980068,58980059,58980058,58980056,58980055,58980054,58980043,58980052,58980051,58980047,58980050,58980049,58980048,58980046,58980044,58980045,58980041,58980039,58980038,58980037,58980034,58980031,58980030,58980029,58980025,58980023,58980020,58980021,58980018,58980017,58980008,58979983,58980005,58980000,58979999,58979991,58979992,58979989,58979988,58979986,58979985,58979984,58979977,58979980,58979979,58979978,58978013,58979963,58979962,58979961,58979960,58979959,58979958,58979948,58979947,58979946,58979764,58979938,58979932,...,58660713,58660712,58660700,58660693,58660701,58660689,58660687,58660691,58660685,58660690,58660672,58660637,58660648,58660646,58660624,58660540,58660522,58660433,58660432,58660428,58660386,58660272,58660267,58660262,58660261,58660260,58660250,58660251,58660253,58660201,58660200,58660172,58660192,58660179,58660186,58660193,58660177,58660175,58660191,58660176,58660190,58660110,58660095,58660063,58660060,58660057,58659979,58659960,58659945,58659937,58659846,58659905,58659899,58659880,58659870,58659867,58659865,58659858,58659839,58659831,58659817,58659816,58659804,58659822,58659784,58659743,58659695,58659692,58659658,58659659,58659660,58659655,58659630,58659551,58659528,58658597,58659433,58659399,58659394,58659300,58659221,58659254,58659161,58659136,58659089,58659092,58659098,58659083,58659099,58659085,58659090,58659081,58659086,58659094,58659061,58658978,58658919,58658777,58658743,58658666,58658655,58658616,58658571,58658559,58658549,58658550,58658547,58658490,58658507,58658510,58658463,58658457,58658431,58658424,58658423,58658416,58658419,58658410,58658407,58658392,58658404,58658395,58658403,58658398,58658333,58658336,58658324,58658307,58658266,58658257,58658254,58658163,58658161,58658152,58658086,58657915,58657844,58657836,58657840,58657732,58657728,58657725,58657704,58657673,58657636,58657597,58657550,58657543,58657542,58657540,58657510,58657377,58657343,58657289,58657276,58657269,58657238,58657217,58657187,58657184,58657179,58657176,58657175,58657173,58657130,58657123,58657126,58657103,58657101,58657085,58657075,58657057,58657054,58657055,58657051,58657046,58657035,58657032,58657026,58656993,58656985,58656976,58656955,58656949,58656938,58656877,58656860,58656855,58656845,58656838,58656837,58656832,58656822,58656802,58656790,58656744,58656736,58656730,58656729,58656726,58656711,58656703,58656701,58656673,58656650,58656634,58656635,58656631,58656628,58656611,58656608,58656584,58656519,58656465,58656492,58656493,58656491,58656489,58656490,58656488,58656486,58656477,58656475,58656472,58656462,58656452,58656448,58656445,58656430,58656399,58656401,58656400,58656357,58656356,58656344,58656306,58656301,58656292,58656289,58656257,58656256,58656246,58656153,58656144,58656131,58656130,58656109,58656099,58656082,58656004
0,Denise,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Matt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
user_item_mat_test.head(5)

Unnamed: 0,user,58980784,58980778,58980506,58980757,58980756,58980742,58980740,58980734,58980714,58980708,58980707,58980705,58980697,58980698,58980694,58980695,58980688,58980681,58980682,58980680,58980679,58980686,58980687,58980685,58980683,58980678,58980675,58980673,58980665,58980664,58980661,58980662,58980652,58980647,58980650,58980649,58980634,58980606,58980601,58980589,58980583,58980568,58980567,58980554,58980548,58980549,58928844,58980534,58980529,58980392,58980526,58980514,58980510,58980511,58980509,58980489,58980486,58980485,58980478,58980476,58980477,58980474,58980471,58980457,58980453,58980452,58980451,58980446,58980442,58980436,58980426,58980424,58980422,58980412,58980399,58980398,58980397,58980393,58980390,58980391,58980384,58980385,58980383,58980382,58980379,58980372,58980370,58980368,58980361,58980360,58980357,58980353,58980350,58980349,58980348,58980340,58980321,58980322,58980320,58980311,58980310,58980309,58980308,58980304,58980275,58980276,58980277,58980278,58980273,58980271,58980272,58980274,58980269,58980265,58980264,58980263,58980262,58980259,58980249,58980248,58980254,58980255,58980256,58980252,58980250,58980251,58980241,58980242,58980232,58980239,58980227,58980234,58980237,58980220,58980219,58980226,58980231,58980236,58980221,58980224,58980222,58980223,58980229,58980214,58980212,58980206,58980205,58980201,58980199,58980198,58980188,58980190,58980189,58980191,58980193,58980192,58980194,58980187,58980181,58980183,58980182,58980180,58980166,58980164,58980161,58980149,58980148,58980145,58980128,58980121,58980125,58980122,58972567,58980120,58972582,58980115,58980113,58980114,58980110,58980112,58980107,58980101,58980098,58980095,58980093,58980091,58980082,58980081,58980071,58980069,58980068,58980059,58980058,58980056,58980055,58980054,58980043,58980052,58980051,58980047,58980050,58980049,58980048,58980046,58980044,58980045,58980041,58980039,58980038,58980037,58980034,58980031,58980030,58980029,58980025,58980023,58980020,58980021,58980018,58980017,58980008,58979983,58980005,58980000,58979999,58979991,58979992,58979989,58979988,58979986,58979985,58979984,58979977,58979980,58979979,58979978,58978013,58979963,58979962,58979961,58979960,58979959,58979958,58979948,58979947,58979946,58979764,58979938,58979932,...,58660713,58660712,58660700,58660693,58660701,58660689,58660687,58660691,58660685,58660690,58660672,58660637,58660648,58660646,58660624,58660540,58660522,58660433,58660432,58660428,58660386,58660272,58660267,58660262,58660261,58660260,58660250,58660251,58660253,58660201,58660200,58660172,58660192,58660179,58660186,58660193,58660177,58660175,58660191,58660176,58660190,58660110,58660095,58660063,58660060,58660057,58659979,58659960,58659945,58659937,58659846,58659905,58659899,58659880,58659870,58659867,58659865,58659858,58659839,58659831,58659817,58659816,58659804,58659822,58659784,58659743,58659695,58659692,58659658,58659659,58659660,58659655,58659630,58659551,58659528,58658597,58659433,58659399,58659394,58659300,58659221,58659254,58659161,58659136,58659089,58659092,58659098,58659083,58659099,58659085,58659090,58659081,58659086,58659094,58659061,58658978,58658919,58658777,58658743,58658666,58658655,58658616,58658571,58658559,58658549,58658550,58658547,58658490,58658507,58658510,58658463,58658457,58658431,58658424,58658423,58658416,58658419,58658410,58658407,58658392,58658404,58658395,58658403,58658398,58658333,58658336,58658324,58658307,58658266,58658257,58658254,58658163,58658161,58658152,58658086,58657915,58657844,58657836,58657840,58657732,58657728,58657725,58657704,58657673,58657636,58657597,58657550,58657543,58657542,58657540,58657510,58657377,58657343,58657289,58657276,58657269,58657238,58657217,58657187,58657184,58657179,58657176,58657175,58657173,58657130,58657123,58657126,58657103,58657101,58657085,58657075,58657057,58657054,58657055,58657051,58657046,58657035,58657032,58657026,58656993,58656985,58656976,58656955,58656949,58656938,58656877,58656860,58656855,58656845,58656838,58656837,58656832,58656822,58656802,58656790,58656744,58656736,58656730,58656729,58656726,58656711,58656703,58656701,58656673,58656650,58656634,58656635,58656631,58656628,58656611,58656608,58656584,58656519,58656465,58656492,58656493,58656491,58656489,58656490,58656488,58656486,58656477,58656475,58656472,58656462,58656452,58656448,58656445,58656430,58656399,58656401,58656400,58656357,58656356,58656344,58656306,58656301,58656292,58656289,58656257,58656256,58656246,58656153,58656144,58656131,58656130,58656109,58656099,58656082,58656004
0,Denise,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Matt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
user_item_mat_train.astype(bool).sum(axis=1) # preferences were correctly assigned!

0    11
1     1
2     4
3     6
dtype: int64

In [None]:
user_item_mat_test.astype(bool).sum(axis=1) # preferences were correctly assigned!

0    7
1    1
2    2
3    2
dtype: int64

Colloborative Filtering needs numbers for users, so we rename our users and assign them a number instead.

In [None]:
userTable = pd.DataFrame()
userTable['user'] = user_item_mat_train['user']
userTable['userId'] = pd.Series(range(0,user_item_mat_train['user'].shape[0]))
userTable

Unnamed: 0,user,userId
0,Denise,0
1,Matt,1
2,1,2
3,3,3


In [None]:
# Now update our main table with userID's instead!
user_item_mat_train['user'] = userTable['userId']
user_item_mat_test['user'] = userTable['userId']
user_item_mat_train.head(5)

Unnamed: 0,user,58980784,58980778,58980506,58980757,58980756,58980742,58980740,58980734,58980714,58980708,58980707,58980705,58980697,58980698,58980694,58980695,58980688,58980681,58980682,58980680,58980679,58980686,58980687,58980685,58980683,58980678,58980675,58980673,58980665,58980664,58980661,58980662,58980652,58980647,58980650,58980649,58980634,58980606,58980601,58980589,58980583,58980568,58980567,58980554,58980548,58980549,58928844,58980534,58980529,58980392,58980526,58980514,58980510,58980511,58980509,58980489,58980486,58980485,58980478,58980476,58980477,58980474,58980471,58980457,58980453,58980452,58980451,58980446,58980442,58980436,58980426,58980424,58980422,58980412,58980399,58980398,58980397,58980393,58980390,58980391,58980384,58980385,58980383,58980382,58980379,58980372,58980370,58980368,58980361,58980360,58980357,58980353,58980350,58980349,58980348,58980340,58980321,58980322,58980320,58980311,58980310,58980309,58980308,58980304,58980275,58980276,58980277,58980278,58980273,58980271,58980272,58980274,58980269,58980265,58980264,58980263,58980262,58980259,58980249,58980248,58980254,58980255,58980256,58980252,58980250,58980251,58980241,58980242,58980232,58980239,58980227,58980234,58980237,58980220,58980219,58980226,58980231,58980236,58980221,58980224,58980222,58980223,58980229,58980214,58980212,58980206,58980205,58980201,58980199,58980198,58980188,58980190,58980189,58980191,58980193,58980192,58980194,58980187,58980181,58980183,58980182,58980180,58980166,58980164,58980161,58980149,58980148,58980145,58980128,58980121,58980125,58980122,58972567,58980120,58972582,58980115,58980113,58980114,58980110,58980112,58980107,58980101,58980098,58980095,58980093,58980091,58980082,58980081,58980071,58980069,58980068,58980059,58980058,58980056,58980055,58980054,58980043,58980052,58980051,58980047,58980050,58980049,58980048,58980046,58980044,58980045,58980041,58980039,58980038,58980037,58980034,58980031,58980030,58980029,58980025,58980023,58980020,58980021,58980018,58980017,58980008,58979983,58980005,58980000,58979999,58979991,58979992,58979989,58979988,58979986,58979985,58979984,58979977,58979980,58979979,58979978,58978013,58979963,58979962,58979961,58979960,58979959,58979958,58979948,58979947,58979946,58979764,58979938,58979932,...,58660713,58660712,58660700,58660693,58660701,58660689,58660687,58660691,58660685,58660690,58660672,58660637,58660648,58660646,58660624,58660540,58660522,58660433,58660432,58660428,58660386,58660272,58660267,58660262,58660261,58660260,58660250,58660251,58660253,58660201,58660200,58660172,58660192,58660179,58660186,58660193,58660177,58660175,58660191,58660176,58660190,58660110,58660095,58660063,58660060,58660057,58659979,58659960,58659945,58659937,58659846,58659905,58659899,58659880,58659870,58659867,58659865,58659858,58659839,58659831,58659817,58659816,58659804,58659822,58659784,58659743,58659695,58659692,58659658,58659659,58659660,58659655,58659630,58659551,58659528,58658597,58659433,58659399,58659394,58659300,58659221,58659254,58659161,58659136,58659089,58659092,58659098,58659083,58659099,58659085,58659090,58659081,58659086,58659094,58659061,58658978,58658919,58658777,58658743,58658666,58658655,58658616,58658571,58658559,58658549,58658550,58658547,58658490,58658507,58658510,58658463,58658457,58658431,58658424,58658423,58658416,58658419,58658410,58658407,58658392,58658404,58658395,58658403,58658398,58658333,58658336,58658324,58658307,58658266,58658257,58658254,58658163,58658161,58658152,58658086,58657915,58657844,58657836,58657840,58657732,58657728,58657725,58657704,58657673,58657636,58657597,58657550,58657543,58657542,58657540,58657510,58657377,58657343,58657289,58657276,58657269,58657238,58657217,58657187,58657184,58657179,58657176,58657175,58657173,58657130,58657123,58657126,58657103,58657101,58657085,58657075,58657057,58657054,58657055,58657051,58657046,58657035,58657032,58657026,58656993,58656985,58656976,58656955,58656949,58656938,58656877,58656860,58656855,58656845,58656838,58656837,58656832,58656822,58656802,58656790,58656744,58656736,58656730,58656729,58656726,58656711,58656703,58656701,58656673,58656650,58656634,58656635,58656631,58656628,58656611,58656608,58656584,58656519,58656465,58656492,58656493,58656491,58656489,58656490,58656488,58656486,58656477,58656475,58656472,58656462,58656452,58656448,58656445,58656430,58656399,58656401,58656400,58656357,58656356,58656344,58656306,58656301,58656292,58656289,58656257,58656256,58656246,58656153,58656144,58656131,58656130,58656109,58656099,58656082,58656004
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


This dataframe looks correct. We only have four users at the moment so we will need to find more.

## Run Collaborative Filtering

We will be using the surprise library for Collaborative Filtering.

In [None]:
!pip install surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[K     |████████████████████████████████| 771 kB 4.2 MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp38-cp38-linux_x86_64.whl size=2626453 sha256=cf555b5aa50f6333d0dc5c961093b8e34b882c214a288de87e9d66e77357d97a
  Stored in directory: /root/.cache/pip/wheels/af/db/86/2c18183a80ba05da35bf0fb7417aac5cddbd93bcb1b92fd3ea
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.3 surprise-0.1


In [None]:
from surprise import SVD, accuracy
from surprise import Dataset, Reader
from surprise.model_selection import cross_validate



'\nrating dataframe will look like this\n| user_id | item_id | rating          |\n|---------|---------|-----------------|\n| 1       | 1       | 5               |\n| ...     | ...     | ...             |\n| n       | m       | 3               |\n'

In [None]:
'''
rating dataframe will look like this
| user_id | item_id | rating          |
|---------|---------|-----------------|
| 1       | 1       | 5               |
| ...     | ...     | ...             |
| n       | m       | 3               |
'''

# initial model
algo = SVD(random_state = 42)
algo.fit(train)
pred = algo.test(test)

# evaluate the rmse result of the prediction and ground thuth
accuracy.rmse(pred)



Now, we generate the similarity matrix on the user item matrix.

# Conclusion and Next Steps <a id='conclusion'></a>

**Conclusion of ML Modeling as of 1/2/23**: 
- All three content-based filtering models perform well
- Cosine Similarity appears to be the most sensitive to differences and has a very useful scale of 0-1.
- Can hook up content-based filtering models to PetMatch UI as-is and it should return good results based on overall similarity measures measured so far.
- User Rankings Data generated require more formating than initially expected but our application tracks all the key required fields for now.
- Collaborative Filtering is harder to implement than initially expected, but we have initial data to give it a try.

**Conclusion of ML Baseline as of 12/6/22**: 
- Average top 5 recommendation per cat in the training set is 10.96. The highest available score is a 12.  
- The result above uses a simple content-based filtering recommendation model without using user perferences, since they are currently not available. Instead it compares items against each other, aka you liked this ketchup so here are 10 other similar types of ketchup. 
- Due to the method used to create the simple content-based filtering model, dev and test set can not be used so to get an initial idea of the results the training set was used. 
- The cats data version 0.5 features need more ways to dileanate one cat from another but based on include visual scans and the average reccomendation score, the simple cat CBF model generally excels at giving you similar cats to what you stated you wanted.
- In instances where there is more ambiguity (aka a chosen cat with less defined details), it will still find cats very similar to it but sometimes it can also throw in very similar cats who are a different breed. This might not be a bad thing.

**Next Steps**:

- Get more user rankings!
- Incorporate distance more effectively
- Can we use description field for cats at all? 
- Collaborative Filtering for item and user-based
  - Use surprise library possibly
  - Add timestamp to rankings so we can be time-sensitive in terms of reccomendations