This workbook implements Context Based Filtering for a Dogs Recommendation System, working from the raw data all the way to model creation and initial results.

# Table of Contents

* [Load in Data and Segment Features from Context data](#segment)
* [Pre-process feature data](#pre-process)
    - [Make Needed Helper Functions and Imports](#pp_pipeline)
    - [Preprocess Data for model runs](#pp)
* [Run Content-Based-Filtering Modeling Iterations](#run_pipeline)
    - [Linear Similarity Results](#ls)
    - [Cosine similarity results](#cs)
    - [Laplacian Similarity Results](#lp)
    - [Overall Content Based Filtering results as of 1/2/2023](#ov)
* [Collaborative Filtering - Under Construction WIP](#cf)    
* [Conclusion and Next Steps](#conclusion)

# Load in Data and Segment Features from Context data<a id='segment'></a>

First, we load all of our adoptable dogs.

In [1]:
from google.colab import drive
import joblib #so I can save files out

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import pandas as pd
dogs_DF = pd.read_csv("/content/drive/MyDrive/MLE10PetMatch/Adoptable_dogs_20221202_withExtras.csv",header=0,index_col=0)
dogs_DF.shape

(97694, 70)

In [3]:
pd.set_option('display.max_columns', 500)
dogs_DF.sample(3)

Unnamed: 0,id,organization_id,url,type,species,age,gender,size,coat,tags,name,description_x,organization_animal_id,photos,videos,status,status_changed_at,published_at,distance,breeds.primary,breeds.secondary,breeds.mixed,breeds.unknown,colors.primary,colors.secondary,colors.tertiary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,environment.children,environment.dogs,environment.cats,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped,description_y,temperament,popularity,min_height,max_height,min_weight,max_weight,min_expectancy,max_expectancy,group,grooming_frequency_value,grooming_frequency_category,shedding_value,shedding_category,energy_level_value,energy_level_category,trainability_value,trainability_category,demeanor_value,demeanor_category
64648,58561296,NC626,https://www.petfinder.com/dog/beezus-58561296/...,Dog,Dog,Young,Female,Medium,,"['Playful', 'Funny', 'Athletic', 'Loves kisses']",Beezus,Meet Beezus!\n\nQuick facts:\nBreed: There is ...,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,[],adoptable,2022-10-17T15:59:00+0000,2022-10-17T15:59:00+0000,,Hound,Rottweiler,True,False,,,,True,False,,False,True,True,True,,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,info@c2cnd.org,(704) 879-1729,,,Charlotte,NC,28215,US,58561296,dog,nc626,,,,,,,,,,,,,,,,,,,,,
35905,58864112,IA186,https://www.petfinder.com/dog/marigold-coop-58...,Dog,Dog,Young,Female,Medium,,[],Marigold Coop,,51381174.0,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,[],adoptable,2022-11-15T19:11:25+0000,2022-11-15T19:11:24+0000,,Hound,Mixed Breed,True,False,,,,True,False,,False,False,,,,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,adopt57@aheinz57.com,(515) 834-1157,4002 Ash Street,,De Soto,IA,50069,US,58864112,dog,ia186,,,,,,,,,,,,,,,,,,,,,
97608,56499241,IL827,https://www.petfinder.com/dog/kaiya-56499241/i...,Dog,Dog,Adult,Female,Medium,Short,['Gentle'],Kaiya,Kaiya is a beautiful foster dog. She has a gen...,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,[],adoptable,2022-07-29T14:57:43+0000,2022-07-29T14:57:43+0000,,German Shepherd Dog,,True,False,Black,Brown / Chocolate,,True,False,,False,True,,True,True,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,trainrescue@gmail.com,,PO Box 3723,,Peoria,IL,61614,US,56499241,dog,il827,,German Shepherd Dogs can stand as high as 26 i...,"Confident, Courageous, Smart",2.0,55.88,66.04,22.679619,40.823313,7.0,10.0,Herding Group,0.4,Weekly Brushing,0.8,Regularly,0.6,Regular Exercise,1.0,Eager to Please,0.6,Alert/Responsive


In [4]:
dogs_DF.columns

Index(['id', 'organization_id', 'url', 'type', 'species', 'age', 'gender',
       'size', 'coat', 'tags', 'name', 'description_x',
       'organization_animal_id', 'photos', 'videos', 'status',
       'status_changed_at', 'published_at', 'distance', 'breeds.primary',
       'breeds.secondary', 'breeds.mixed', 'breeds.unknown', 'colors.primary',
       'colors.secondary', 'colors.tertiary', 'attributes.spayed_neutered',
       'attributes.house_trained', 'attributes.declawed',
       'attributes.special_needs', 'attributes.shots_current',
       'environment.children', 'environment.dogs', 'environment.cats',
       'primary_photo_cropped.small', 'primary_photo_cropped.medium',
       'primary_photo_cropped.large', 'primary_photo_cropped.full',
       'contact.email', 'contact.phone', 'contact.address.address1',
       'contact.address.address2', 'contact.address.city',
       'contact.address.state', 'contact.address.postcode',
       'contact.address.country', 'animal_id', 'animal_type

Drop animals with no pictures since they are key to our 'tinder-like' app experience.

In [5]:
dogs_DF = dogs_DF.dropna(subset=['primary_photo_cropped.full'])# drop rows with 0 pictures
dogs_DF.shape # matches na count via sweet viz for cats

(97694, 70)

Next we seperate the dataframe into features to model over and context data that can be shown to the user for any matches. 'ID' will be our shared key between the two tables.

Of note, the 'distance' field and 'primary_photo_cropped.full' field will be useful data for future model enhancements. For the models so far, we will simply use textual data and assume a 0 distance for all pets.

In [6]:
contextCols = ['id','organization_id','url','type','tags','name','description_x','organization_animal_id',
              'photos','primary_photo_cropped','videos','status','status_changed_at','published_at',
              'distance','contact.email', 'contact.phone', 'contact.address.address1',
               'contact.address.address2', 'contact.address.city','contact.address.state', 
               'contact.address.postcode','contact.address.country', 'animal_id', 'animal_type',
               'organization_id.1', 'primary_photo_cropped.small','primary_photo_cropped.medium',
               'primary_photo_cropped.large','primary_photo_cropped.full','description_y',
               'temperament','popularity','min_height','max_height','min_weight','max_weight',
               'min_expectancy','max_expectancy','grooming_frequency_value','shedding_value',
               'energy_level_value','trainability_value','demeanor_value']
featureCols = ['id','age','gender','size','coat','breeds.primary', 'breeds.secondary','breeds.mixed',
              'breeds.unknown','colors.primary','colors.secondary','colors.tertiary',
              'attributes.spayed_neutered','attributes.house_trained','attributes.declawed',
              'attributes.special_needs','attributes.shots_current','environment.children',
              'environment.dogs','environment.cats','type','contact.address.postcode',
              'group','grooming_frequency_category','shedding_category','energy_level_category',
               'trainability_category','demeanor_category'] # initial columns to keep for training purposes
dogs_DF_features = dogs_DF[featureCols]
dogs_DF_context = dogs_DF[contextCols]
dogs_DF_features.shape

(97694, 28)

Let's sanity check our missing values now that we just have dogs and remove any columns with too many missing values.

In [7]:
valueCounts = dogs_DF_features.set_index('type').isna().groupby(level=0).sum()/dogs_DF_features.shape[0] # level=0 refers to our index, which we made 'type'


In [8]:
pd.set_option('display.max_columns', 500)
valueCounts 

Unnamed: 0_level_0,id,age,gender,size,coat,breeds.primary,breeds.secondary,breeds.mixed,breeds.unknown,colors.primary,colors.secondary,colors.tertiary,attributes.spayed_neutered,attributes.house_trained,attributes.declawed,attributes.special_needs,attributes.shots_current,environment.children,environment.dogs,environment.cats,contact.address.postcode,group,grooming_frequency_category,shedding_category,energy_level_category,trainability_category,demeanor_category
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
Dog,0.0,0.0,0.0,0.0,0.661535,0.0,0.632639,0.0,0.0,0.456026,0.660153,0.885756,0.0,0.0,1.0,0.0,0.0,0.668342,0.546216,0.802168,0.000194,0.519561,0.526767,0.527003,0.519571,0.5204,0.5204


In [9]:
valueCounts = dogs_DF_context.set_index('type').isna().groupby(level=0).sum()/dogs_DF_context.shape[0] # level=0 refers to our index, which we made 'type'


In [10]:
pd.set_option('display.max_columns', 500)
valueCounts 

Unnamed: 0_level_0,id,organization_id,url,tags,name,description_x,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full,description_y,temperament,popularity,min_height,max_height,min_weight,max_weight,min_expectancy,max_expectancy,grooming_frequency_value,shedding_value,energy_level_value,trainability_value,demeanor_value
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1
Dog,0.0,0.0,0.0,0.0,0.0,0.229277,0.344289,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.062931,0.245726,0.439014,0.942883,0.000154,0.000154,0.000194,0.000154,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.519561,0.519561,0.555899,0.519561,0.519561,0.522253,0.522253,0.521301,0.521301,0.526767,0.527003,0.519571,0.5204,0.5204


After a quick NA check, we will have to remove 'coat','breeds.secondary','colors.secondary','colors.tertiary', and 'attributes.declawed' (doesn't make sense for dogs). The column 'colors.primary' is also missing a lot of values but for sake of differing one dog from another it will be kept for now. Additionally, we will bring back in address postcode as an initial attempt to match nearby dogs together. Also, 'environment.children','environment.dogs', and 'environment.cats' have a lot of missing values but users derive a lot of value from this information. Lastly, the AKC data has a lot of missing data but that is only because the dataset has a lot of mixed breeds and AKC only handles purebreds. The value for purebreds is significant. Therefore, they will be kept as well.

In [11]:
featureCols = ['id','age','gender','size','breeds.primary','breeds.mixed',
               'colors.primary','attributes.spayed_neutered','attributes.house_trained',
               'attributes.special_needs','attributes.shots_current',
               'contact.address.postcode','environment.children','environment.dogs','environment.cats',
               'group','grooming_frequency_category','shedding_category',
               'energy_level_category', 'trainability_category', 'demeanor_category']

dogs_DF_features = dogs_DF[featureCols]
dogs_DF_context = dogs_DF[contextCols]
dogs_DF_features.shape

(97694, 21)

In [12]:
dogs_DF_features.dtypes

id                              int64
age                            object
gender                         object
size                           object
breeds.primary                 object
breeds.mixed                     bool
colors.primary                 object
attributes.spayed_neutered       bool
attributes.house_trained         bool
attributes.special_needs         bool
attributes.shots_current         bool
contact.address.postcode       object
environment.children           object
environment.dogs               object
environment.cats               object
group                          object
grooming_frequency_category    object
shedding_category              object
energy_level_category          object
trainability_category          object
demeanor_category              object
dtype: object

# Pre-process feature data<a id='pre-process'></a>

## Make Needed Helper Functions and Imports <a id='pp_pipeline'></a>

Make needed helper functions for modeling later on in workbook.

In [13]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel 
from sklearn.metrics.pairwise import laplacian_kernel
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.decomposition import TruncatedSVD

In [14]:
def remove_columns_with_1_distinct(df):
    drop_col = [e for e in df.columns if df[e].nunique()==1]
    df_return = df.drop(drop_col,axis=1)
    return df_return


In [15]:
def drop_duplicates(df):
    df_return = df.drop_duplicates()
    return df_return


In [16]:
def linear_similarities(df_1,id_df):
    cs_simil = linear_kernel(df_1,df_1)
    results = {}
    ds = id_df # needs id column
    for idx, row in ds.iterrows():
       similar_indices = cs_simil[idx].argsort()[:-100:-1] 
       similar_items = [(cs_simil[idx][i], ds['id'][i]) for i in similar_indices] 
       results[row['id']] = similar_items[1:]
    return results

In [17]:
def cosine_similarities(df_1,id_df):
    cs_simil = cosine_similarity(df_1,df_1)
    results = {}
    ds = id_df # needs id column
    for idx, row in ds.iterrows():
       similar_indices = cs_simil[idx].argsort()[:-100:-1] 
       similar_items = [(cs_simil[idx][i], ds['id'][i]) for i in similar_indices] 
       results[row['id']] = similar_items[1:]
    return results

In [18]:
def laplacian_similarities(df_1,id_df):
    cs_simil = laplacian_kernel(df_1,df_1)
    results = {}
    ds = id_df # needs id column
    for idx, row in ds.iterrows():
       similar_indices = cs_simil[idx].argsort()[:-100:-1] 
       similar_items = [(cs_simil[idx][i], ds['id'][i]) for i in similar_indices] 
       results[row['id']] = similar_items[1:]
    return results

In [46]:
def item(id,df):  
    ds = df
    colsGrab = ['id']
    return ds.loc[ds['id'] == id][colsGrab].values[0]# Just reads the results out of the dictionary.

def url(id,df):  
    ds = df
    colsGrab = ['url']
    return ds.loc[ds['id'] == id][colsGrab].values[0]# Just reads the results out of the dictionary.

def picture(id,df):  
    ds = df
    colsGrab = ['primary_photo_cropped.full']
    return ds.loc[ds['id'] == id][colsGrab].values[0]# Just reads the results out of the dictionary.

def recommend(item_id, num,df,reccs):
    print("Recommending " + str(num) + " dogs similar to " + str(item(item_id,df)) + "... " 
          + picture(item_id,df) + " - " + url(item_id,df))   
    print("-------")    
    recs = reccs[item_id][:num]   
    for rec in recs: 
        print("Recommended: " + str(item(rec[1],df)) + " (score:" +      str(rec[0]) + ") " 
              + picture(rec[1],df) + " - " + url(rec[1],df))
    
def score(reccs, num):
    print("Finding average reccomendation score for top 5 reccomendations per example")
    results = []
    for key in reccs.keys():
        subRecs = reccs[key][:num]
        for r in subRecs:
            results.append(r[0])
    averageRecc = sum(results) / len(results)
    print("There are "+ str(len(results)) + 'results with a sum of' + str(sum(results)) + 'and and average of: ' 
          + str(averageRecc) )
    return averageRecc

## Preprocess Data for model runs <a id='pp'></a>

Now that essential methods are defined, lets handle the data.

In [20]:
dogs_DF_features.head(3) #sneak peak of what we have to work with initially

Unnamed: 0,id,age,gender,size,breeds.primary,breeds.mixed,colors.primary,attributes.spayed_neutered,attributes.house_trained,attributes.special_needs,attributes.shots_current,contact.address.postcode,environment.children,environment.dogs,environment.cats,group,grooming_frequency_category,shedding_category,energy_level_category,trainability_category,demeanor_category
0,59027590,Adult,Female,Large,Golden Retriever,True,,True,True,False,True,7442,True,True,False,Sporting Group,Weekly Brushing,Seasonal,Needs Lots of Activity,Eager to Please,Friendly
1,59027588,Adult,Male,Small,Dandie Dinmont Terrier,True,Black,True,True,False,True,75093,,True,,Terrier Group,Daily Brushing,Infrequent,Regular Exercise,Independent,Reserved with Strangers
2,59027587,Baby,Female,Large,Great Pyrenees,False,White / Cream,True,False,False,True,36541,True,True,True,Working Group,Weekly Brushing,Seasonal,Needs Lots of Activity,Independent,Reserved with Strangers


Besides the id value, which is our shared key, all other fields are categorical. We can use One-Hot encoding to transform them into something more efficient to run models over. 

Some preprocessing before One-Hot Encoding must occur to ensure everything goes as planned. First, we proactively drop duplicate rows. Second, we remove any features with only 1 distince value, since content-based filtering uses differences between objects and if everyone is the same there is no new information. Third, we replace NaNs with a special string so that One-Hot Encoding can work. Lastly, we fix the postcode to a string so that One-Hot Encoding works properly.

In [21]:
# Preprocess data before encoding occurs for some troublesome fields
X = dogs_DF_features
X = drop_duplicates(X) # remove duplicate rows
X = remove_columns_with_1_distinct(X) # remove any features with only 1 distinct value
X["contact.address.postcode"]= X["contact.address.postcode"].astype(str) # fix postcode to be a str rather than an int
# One-Hot Encoder requires all strings or all ints, so bools are not strings
X['breeds.mixed'] = X['breeds.mixed'].map({True: 'True', False: 'False'}) 
X['attributes.spayed_neutered'] = X['attributes.spayed_neutered'].map({True: 'True', False: 'False'}) 
X['attributes.house_trained'] = X['attributes.house_trained'].map({True: 'True', False: 'False'}) 
X['attributes.special_needs'] = X['attributes.special_needs'].map({True: 'True', False: 'False'}) 
X['attributes.shots_current'] = X['attributes.shots_current'].map({True: 'True', False: 'False'}) 
X['environment.children'] = X['environment.children'].map({True: 'True', False: 'False'}) 
X['environment.dogs'] = X['environment.dogs'].map({True: 'True', False: 'False'}) 
X['environment.cats'] = X['environment.cats'].map({True: 'True', False: 'False'}) 

X = X.replace(np.nan,'Not Available') # replace nan's with their own special category, do this last once types all fixed!
X.dtypes
#target = 'todo' # will be rankings once we have them
#X, y = cats_DF_features.drop(columns=target), cats_DF_features[target]

id                              int64
age                            object
gender                         object
size                           object
breeds.primary                 object
breeds.mixed                   object
colors.primary                 object
attributes.spayed_neutered     object
attributes.house_trained       object
attributes.special_needs       object
attributes.shots_current       object
contact.address.postcode       object
environment.children           object
environment.dogs               object
environment.cats               object
group                          object
grooming_frequency_category    object
shedding_category              object
energy_level_category          object
trainability_category          object
demeanor_category              object
dtype: object

In [22]:
X_transform = X

In [23]:
X_transform.shape

(97459, 21)

In [24]:
X_transform.sample(3)

Unnamed: 0,id,age,gender,size,breeds.primary,breeds.mixed,colors.primary,attributes.spayed_neutered,attributes.house_trained,attributes.special_needs,attributes.shots_current,contact.address.postcode,environment.children,environment.dogs,environment.cats,group,grooming_frequency_category,shedding_category,energy_level_category,trainability_category,demeanor_category
19798,58954015,Adult,Male,Small,Dachshund,True,Not Available,True,False,False,False,92782,Not Available,Not Available,Not Available,Hound Group,Weekly Brushing,Occasional,Regular Exercise,Agreeable,Alert/Responsive
21841,58944698,Young,Male,Medium,American Bulldog,False,White / Cream,True,False,False,True,12809,Not Available,Not Available,Not Available,Foundation Stock Service,Occasional Bath/Brush,Seasonal,Energetic,Agreeable,Alert/Responsive
44702,58798522,Baby,Female,Medium,Border Collie,True,"Tricolor (Brown, Black, & White)",False,False,False,True,8555,True,True,True,Herding Group,2-3 Times a Week Brushing,Seasonal,Needs Lots of Activity,Eager to Please,Reserved with Strangers


Now we make our train, dev, and test sets. Content-Based Filtering does not use the dev or test set but it is also very RAM hungry so it can't use the full data set. Also, it is good practice to keep the train set the train set. All models will therefore train on the train set.

In [25]:
# split data
x_t, x_test = train_test_split(X_transform,test_size=0.25,train_size=0.75, random_state=13)
x_train, x_dev = train_test_split(x_t,test_size = 0.25,train_size =0.75, random_state=13)

In [26]:
x_train_index = x_train.index
x_train.shape

(54820, 21)

In [27]:
x_dev_index = x_dev.index
x_dev.shape

(18274, 21)

In [28]:
x_test_index = x_test.index
x_test.shape

(24365, 21)

In [60]:
x_train.sample(3)

Unnamed: 0,id,age,gender,size,breeds.primary,breeds.mixed,colors.primary,attributes.spayed_neutered,attributes.house_trained,attributes.special_needs,attributes.shots_current,contact.address.postcode,environment.children,environment.dogs,environment.cats,group,grooming_frequency_category,shedding_category,energy_level_category,trainability_category,demeanor_category
29630,56906044,Young,Male,Medium,Labrador Retriever,True,"Tricolor (Brown, Black, & White)",True,True,False,True,6351,True,True,False,Sporting Group,Weekly Brushing,Regularly,Needs Lots of Activity,Eager to Please,Outgoing
9433,58983508,Young,Female,Large,German Shepherd Dog,True,Black,True,True,False,True,21162,True,True,Not Available,Herding Group,Weekly Brushing,Regularly,Regular Exercise,Eager to Please,Alert/Responsive
45018,58882011,Senior,Female,Large,American Staffordshire Terrier,True,Not Available,True,False,False,False,92630,Not Available,True,True,Terrier Group,Occasional Bath/Brush,Occasional,Regular Exercise,Agreeable,Alert/Responsive


In [30]:
train_context = dogs_DF_context.loc[x_train_index]
train_context.shape

(54820, 44)

In [31]:
train_context.head(3)

Unnamed: 0,id,organization_id,url,type,tags,name,description_x,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full,description_y,temperament,popularity,min_height,max_height,min_weight,max_weight,min_expectancy,max_expectancy,grooming_frequency_value,shedding_value,energy_level_value,trainability_value,demeanor_value
79950,57546510,WV13,https://www.petfinder.com/dog/river-57546510/w...,Dog,[],River,River is ready to show how much love senior do...,HCW-A-9367,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-09-23T16:00:20+0000,2022-09-23T16:00:20+0000,,hcwshelteradoptions@gmail.com,(304) 696-5551,1901 James River Road,,Huntington,WV,25704,US,57546510,dog,wv13,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,,,,,,,,,,,,,,
56384,58681771,ID31,https://www.petfinder.com/dog/boomer-58681771/...,Dog,[],Boomer,,PAS-A-19463,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-10-27T21:55:51+0000,2022-10-27T21:55:51+0000,,Info@pasidaho.org,(208) 265-7297,"870 Kootenai Cut-off Rd. Ponderay, ID",,Ponderay,ID,83852,US,58681771,dog,id31,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,,,,,,,,,,,,,,
59708,58640292,GA926,https://www.petfinder.com/dog/garnet-09-0319-5...,Dog,[],Garnet 09-0319,,RARR-A-2413,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-10-24T10:46:50+0000,2022-10-24T10:46:49+0000,,petfinder@royalanimalrefuge.org,,414 Jenkins Rd,,Tyrone,GA,30290,US,58640292,dog,ga926,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,,,,,,,,,,,,,,


The entries match! We need to pass to our models the numerical data to analyze similarity of products and the context data that goes along with it. As long as the indexes are the same, we can stitch them back together.

 Now let's do it for dev and test sets too!

In [32]:
dev_context = dogs_DF_context.loc[x_dev_index]
dev_context.shape

(18274, 44)

In [33]:
test_context = dogs_DF_context.loc[x_test_index]
test_context.shape

(24365, 44)

In [34]:
dogs_DF_context.head(3) # matches dogs_DF_features indexes as well. good!

Unnamed: 0,id,organization_id,url,type,tags,name,description_x,organization_animal_id,photos,primary_photo_cropped,videos,status,status_changed_at,published_at,distance,contact.email,contact.phone,contact.address.address1,contact.address.address2,contact.address.city,contact.address.state,contact.address.postcode,contact.address.country,animal_id,animal_type,organization_id.1,primary_photo_cropped.small,primary_photo_cropped.medium,primary_photo_cropped.large,primary_photo_cropped.full,description_y,temperament,popularity,min_height,max_height,min_weight,max_weight,min_expectancy,max_expectancy,grooming_frequency_value,shedding_value,energy_level_value,trainability_value,demeanor_value
0,59027590,NJ519,https://www.petfinder.com/dog/roxy-59027590/nj...,Dog,"['Friendly', 'Affectionate', 'Loyal', 'Gentle'...",Roxy,Roxy is 4 yrs old and weighs 60Lbs. She is spa...,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-12-02T05:57:18+0000,2022-12-02T05:57:17+0000,,Doggiedogrescue@aol.com,,,,Pompton Lakes,NJ,7442,US,59027590,dog,nj519,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,"The Golden Retriever is a sturdy, muscular dog...","Friendly, Intelligent, Devoted",3,54.61,60.96,24.94758,34.019428,10.0,12.0,0.4,0.6,1.0,1.0,0.8
1,59027588,TX1203,https://www.petfinder.com/dog/blackie-59027588...,Dog,"['Friendly', 'Gentle']",Blackie,,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-12-02T05:55:53+0000,2022-12-02T05:55:52+0000,,tzrinquiries@tzuzoorescue.com,,,,Plano,TX,75093,US,59027588,dog,tx1203,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,Physical hallmarks of the Dandie Dinmont Terri...,"Independent, Smart, Proud",176,20.32,27.94,8.164663,10.886217,12.0,15.0,0.8,0.2,0.6,0.4,0.4
2,59027587,AL387,https://www.petfinder.com/dog/marlie-59027587/...,Dog,[],Marlie,NEWBIR ALERT!!\n\nThis sweet pup is Marlie! Sh...,,[{'small': 'https://dl5zpyw5k3jeb.cloudfront.n...,,[],adoptable,2022-12-02T05:54:25+0000,2022-12-02T05:54:24+0000,,wagsandwhiskersrescuepets@yahoo.com,,,,Grand Bay,AL,36541,US,59027587,dog,al387,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,https://dl5zpyw5k3jeb.cloudfront.net/photos/pe...,"Frequently described as “majestic,” Pyrs are b...","Smart, Patient, Calm",66,63.5,81.28,38.555351,45.359237,10.0,12.0,0.4,0.6,1.0,0.4,0.4


Since we know the indexs match, lets get rid of the id columns.

In [35]:
x_train = x_train.reset_index(drop=True) # required so keys work properly
x_train_woID = x_train.drop(columns='id')
x_train_woID.dtypes

age                            object
gender                         object
size                           object
breeds.primary                 object
breeds.mixed                   object
colors.primary                 object
attributes.spayed_neutered     object
attributes.house_trained       object
attributes.special_needs       object
attributes.shots_current       object
contact.address.postcode       object
environment.children           object
environment.dogs               object
environment.cats               object
group                          object
grooming_frequency_category    object
shedding_category              object
energy_level_category          object
trainability_category          object
demeanor_category              object
dtype: object

In [36]:
x_dev_woID = x_dev.drop(columns='id')
x_dev_woID.dtypes

age                            object
gender                         object
size                           object
breeds.primary                 object
breeds.mixed                   object
colors.primary                 object
attributes.spayed_neutered     object
attributes.house_trained       object
attributes.special_needs       object
attributes.shots_current       object
contact.address.postcode       object
environment.children           object
environment.dogs               object
environment.cats               object
group                          object
grooming_frequency_category    object
shedding_category              object
energy_level_category          object
trainability_category          object
demeanor_category              object
dtype: object

In [37]:
x_test_woID = x_test.drop(columns='id')
x_test_woID.dtypes

age                            object
gender                         object
size                           object
breeds.primary                 object
breeds.mixed                   object
colors.primary                 object
attributes.spayed_neutered     object
attributes.house_trained       object
attributes.special_needs       object
attributes.shots_current       object
contact.address.postcode       object
environment.children           object
environment.dogs               object
environment.cats               object
group                          object
grooming_frequency_category    object
shedding_category              object
energy_level_category          object
trainability_category          object
demeanor_category              object
dtype: object

In [38]:
X_transform_woID = X_transform.drop(columns='id')

Notice that indexes are the same and id columns are gone, so we can recover the IDs later! Now we can apply One-Hot Encoding!

In [39]:
ohe = OneHotEncoder().fit(X_transform_woID) # One Hot Encoding WAAAY better, fit on whole X
X_train_transform = ohe.transform(x_train_woID) # don't need to add id columns because same columns preserved
X_dev_transform  = ohe.transform(x_dev_woID)
X_test_transform = ohe.transform(x_test_woID)

# Run Content-Based-Filtering Modeling Iterations <a id='run_pipeline'></a>

Content-Based Filtering is a method of comparing products against each other when you don't have user rankings. This can be a simple way to create models before user ranking data is available and can often do well in recommending similar products. In our case, products are cats. Let's explore a few options for Content-Based Filtering and see how they do.

## Linear Similarity Results <a id='ls'></a>

In [40]:
Linear_Model =linear_similarities(X_train_transform,x_train) #run similarities with linear kernel


In [41]:
joblib.dump(Linear_Model, '/content/drive/MyDrive/MLE10PetMatch/models/linear_similarity_model_dogsv1.pkl')

['/content/drive/MyDrive/MLE10PetMatch/models/linear_similarity_model_dogsv1.pkl']

In [42]:
Linear_Model = joblib.load('/content/drive/MyDrive/MLE10PetMatch/models/linear_similarity_model_dogsv1.pkl')

In [47]:
pd.options.display.max_colwidth = 100
recommend(item_id=58925506, num=5,df=train_context,reccs=Linear_Model)

['Recommending 5 dogs similar to [58925506]... https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58925506/2/?bust=1669021069 - https://www.petfinder.com/dog/denali-eskimo-58925506/ca/manhattan-beach/caring-songs-rescue-ca2668/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
-------
['Recommended: [58978435] (score:18.0) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58978435/1/?bust=1669582687 - https://www.petfinder.com/dog/aaahh-the-taste-of-pepsi-cola-huskimo-58978435/ca/manhattan-beach/caring-songs-rescue-ca2668/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58919722] (score:17.0) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58919722/1/?bust=1668964953 - https://www.petfinder.com/dog/cass-58919722/oh/heath/licking-county-humane-society-oh483/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58829056] (score:16.0) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58829056/1/?bust=1668208568 - https://www.petfinder.com/dog/calvin-58829056

The above is score for one item only so now let's get an idea of how well this does for the entire training set.

In [48]:
# Gather average score of top 5 recommendations for training set, with a max score of 15!
linearScore = score(reccs=Linear_Model, num=5)
linearScore

Finding average reccomendation score for top 5 reccomendations per example
There are 274100results with a sum of5007278.0and and average of: 18.268070047427948


18.268070047427948

The overall score for the whole training set for Linear Kernel is 13.77/15 or .913

## Cosine similarity results <a id='cs'></a>

In [49]:
Cosine_Model =cosine_similarities(X_train_transform,x_train) #run similarities with cosine similarity


In [50]:
joblib.dump(Cosine_Model, '/content/drive/MyDrive/MLE10PetMatch/models/cosine_similarity_model_dogsv1.pkl')

['/content/drive/MyDrive/MLE10PetMatch/models/cosine_similarity_model_dogsv1.pkl']

In [51]:
Cosine_Model = joblib.load('/content/drive/MyDrive/MLE10PetMatch/models/cosine_similarity_model_dogsv1.pkl')

In [62]:
pd.options.display.max_colwidth = 100
recommend(item_id=58925506, num=5,df=train_context,reccs=Cosine_Model)

['Recommending 5 dogs similar to [58925506]... https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58925506/2/?bust=1669021069 - https://www.petfinder.com/dog/denali-eskimo-58925506/ca/manhattan-beach/caring-songs-rescue-ca2668/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
-------
['Recommended: [58978435] (score:0.9000000000000002) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58978435/1/?bust=1669582687 - https://www.petfinder.com/dog/aaahh-the-taste-of-pepsi-cola-huskimo-58978435/ca/manhattan-beach/caring-songs-rescue-ca2668/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58919722] (score:0.8500000000000002) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58919722/1/?bust=1668964953 - https://www.petfinder.com/dog/cass-58919722/oh/heath/licking-county-humane-society-oh483/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58829056] (score:0.8000000000000002) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58829056/1/?bust=1668208568 - htt

The above is score for one item only so now let's get an idea of how well this does for the entire training set.

In [53]:
# Gather average score of top 5 recommendations for training set, with a max score of 1!
cosineScore = score(reccs=Cosine_Model, num=5)
cosineScore

Finding average reccomendation score for top 5 reccomendations per example
There are 274100results with a sum of250363.90000016915and and average of: 0.9134035023720144


0.9134035023720144

The overall score for the whole training set for Cosine Similarity is .918

## Laplacian Similarity Results <a id='lp'></a>

In [54]:
lp_Model =laplacian_similarities(X_train_transform,x_train) #run similarities with polynomial similarity

In [55]:
joblib.dump(lp_Model, '/content/drive/MyDrive/MLE10PetMatch/models/laplace_similarity_model_dogsv1.pkl')

['/content/drive/MyDrive/MLE10PetMatch/models/laplace_similarity_model_dogsv1.pkl']

In [56]:
lp_Model = joblib.load('/content/drive/MyDrive/MLE10PetMatch/models/laplace_similarity_model_dogsv1.pkl')

In [57]:
pd.options.display.max_colwidth = 100
recommend(item_id=58925506, num=5,df=train_context,reccs=lp_Model)

['Recommending 5 dogs similar to [58925506]... https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58925506/2/?bust=1669021069 - https://www.petfinder.com/dog/denali-eskimo-58925506/ca/manhattan-beach/caring-songs-rescue-ca2668/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
-------
['Recommended: [58978435] (score:0.9993698802981084) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58978435/1/?bust=1669582687 - https://www.petfinder.com/dog/aaahh-the-taste-of-pepsi-cola-huskimo-58978435/ca/manhattan-beach/caring-songs-rescue-ca2668/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58919722] (score:0.9990549693568677) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58919722/1/?bust=1668964953 - https://www.petfinder.com/dog/cass-58919722/oh/heath/licking-county-humane-society-oh483/?referrer_id=c2f7479c-c7e8-422b-bfb4-7c0b8aed0e55']
['Recommended: [58829056] (score:0.9987401576470555) https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/58829056/1/?bust=1668208568 - htt

The above is score for one item only so now let's get an idea of how well this does for the entire training set.

In [58]:
# Gather average score of top 5 recommendations for training set, with a max score of 1!
lpScore = score(reccs=lp_Model, num=5)
lpScore

Finding average reccomendation score for top 5 reccomendations per example
There are 274100results with a sum of273950.44263852807and and average of: 0.9994543693488802


0.9994543693488802

The overall score for the whole training set for Laplacian Similarity is .999.

## Overall Content Based Filtering results as of 1/2/2023 <a id='ov'></a>

In [59]:
from tabulate import tabulate
table = [['Model Name', 'Score'],
         ['Linear Kernel',linearScore],
         ['Cosine Similarity',cosineScore],
         ['Laplacian Kernel',lpScore]]
print(tabulate(table,headers='firstrow',tablefmt='fancy_grid'))

╒═══════════════════╤═══════════╕
│ Model Name        │     Score │
╞═══════════════════╪═══════════╡
│ Linear Kernel     │ 18.2681   │
├───────────────────┼───────────┤
│ Cosine Similarity │  0.913404 │
├───────────────────┼───────────┤
│ Laplacian Kernel  │  0.999454 │
╘═══════════════════╧═══════════╛


1. All three distance measures return the same values for our test instance for the top 5.
2.   The only difference is in how the numbers are calculated, which makes it a little misleading of choosing one over the other. That said, it appears that cosine similarity is the most sensitive to differences and is currently the prefered content-based similarity model.
3.   Additional model iterations will attempt to tease these content-based filtering results further but for now they all seem to work as intended. The cosine similarity and laplacian kernel are nice because they have a finite output score between 0 and 1.

# Collaborative Filtering - Under Construction WIP <a id='cf'></a>

Collobrative Filtering uses rankings to reccommend new products to customers and have several approaches one can take. For this first iteration, we will use a model-based SVD (Matrix Factorization) approach on user-item interactions. 

## Upload and Prep the Data

In [None]:
#import pandas as pd
#cat_rankings = pd.read_csv("/content/drive/MyDrive/MLE10PetMatch/petmatch_rankings_cats.csv",header=0,index_col=0)
#cat_rankings.shape

In [None]:
'''
rating dataframe will look like this
| user_id | item_id | rating          |
|---------|---------|-----------------|
| 1       | 1       | 5               |
| ...     | ...     | ...             |
| n       | m       | 3               |
'''
#cat_rankings.head(3)

First, lets make a train-test split

In [None]:
# before we split, user ranking counts
#cat_rankings.groupby('user_name').count()

In [None]:
#cf_train, cf_test = train_test_split(cat_rankings,test_size=0.2,train_size=0.8, random_state=12)


In [None]:
#cf_test.groupby('user_name').count()

In [None]:
#cf_train.groupby('user_name').count()

This train-test mix seems to generally keep a 80-20% balance among users. This should work.

As we can see, we need to fix our data first to match the correct format.

In [None]:
#user_item_mat_train = pd.DataFrame()
#user_item_mat_train['user']= pd.unique(cat_rankings.index) # add unique users first
#user_item_mat_train[[cats_DF['id']]] = 0 # assume dislike if no data
#user_item_mat_train.head(3)

In [None]:
#user_item_mat_test = user_item_mat_train.copy() # make blank template for test as well

In [None]:
#user_item_mat_test.head(3)

Now we have a default tables, lets update it with our saved rankings so far!

In [None]:
#for index, row in cf_train.iterrows(): #update train table
#    #print(index) # shows names
#    indextoChange = user_item_mat_train[user_item_mat_train['user']==index].index #our name to change
#    columntoChange = row[0] # animal to update
#    preferencetoChange = row[1] # ranking for animal to use
#    user_item_mat_train.at[indextoChange,columntoChange] = preferencetoChange # update cell in dataframe

In [None]:
#for index, row in cf_test.iterrows(): #update test table
#    #print(index) # shows names
#    indextoChange = user_item_mat_test[user_item_mat_test['user']==index].index #our name to change
#    columntoChange = row[0] # animal to update
#    preferencetoChange = row[1] # ranking for animal to use
#    user_item_mat_test.at[indextoChange,columntoChange] = preferencetoChange # update cell in dataframe

In [None]:
#user_item_mat_train.head(5)

In [None]:
#user_item_mat_test.head(5)

In [None]:
#user_item_mat_train.astype(bool).sum(axis=1) # preferences were correctly assigned!

In [None]:
#user_item_mat_test.astype(bool).sum(axis=1) # preferences were correctly assigned!

Colloborative Filtering needs numbers for users, so we rename our users and assign them a number instead.

In [None]:
#userTable = pd.DataFrame()
#userTable['user'] = user_item_mat_train['user']
#userTable['userId'] = pd.Series(range(0,user_item_mat_train['user'].shape[0]))
#userTable

In [None]:
# Now update our main table with userID's instead!
#user_item_mat_train['user'] = userTable['userId']
#user_item_mat_test['user'] = userTable['userId']
#user_item_mat_train.head(5)

This dataframe looks correct. We only have four users at the moment so we will need to find more.

## Run Collaborative Filtering

We will be using the surprise library for Collaborative Filtering.

In [None]:
#!pip install surprise

In [None]:
#from surprise import SVD, accuracy
#from surprise import Dataset, Reader
#from surprise.model_selection import cross_validate



In [None]:
'''
rating dataframe will look like this
| user_id | item_id | rating          |
|---------|---------|-----------------|
| 1       | 1       | 5               |
| ...     | ...     | ...             |
| n       | m       | 3               |
'''

# initial model
#algo = SVD(random_state = 42)
#algo.fit(train)
#pred = algo.test(test)

# evaluate the rmse result of the prediction and ground thuth
#accuracy.rmse(pred)

Now, we generate the similarity matrix on the user item matrix.

# Conclusion and Next Steps <a id='conclusion'></a>

**Conclusion of ML Modeling as of 1/2/23**: 
- All three content-based filtering models perform well
- Cosine Similarity appears to be the most sensitive to differences and has a very useful scale of 0-1.
- Can hook up content-based filtering models to PetMatch UI as-is and it should return good results based on overall similarity measures measured so far.
- User Rankings Data generated require more formating than initially expected but our application tracks all the key required fields for now.
- Collaborative Filtering is harder to implement than initially expected, but we have initial data to give it a try.

**Conclusion of ML Baseline as of 12/6/22**: 
- Average top 5 recommendation per cat in the training set is 10.96. The highest available score is a 12.  
- The result above uses a simple content-based filtering recommendation model without using user perferences, since they are currently not available. Instead it compares items against each other, aka you liked this ketchup so here are 10 other similar types of ketchup. 
- Due to the method used to create the simple content-based filtering model, dev and test set can not be used so to get an initial idea of the results the training set was used. 
- The cats data version 0.5 features need more ways to dileanate one cat from another but based on include visual scans and the average reccomendation score, the simple cat CBF model generally excels at giving you similar cats to what you stated you wanted.
- In instances where there is more ambiguity (aka a chosen cat with less defined details), it will still find cats very similar to it but sometimes it can also throw in very similar cats who are a different breed. This might not be a bad thing.

**Next Steps**:

- Get more user rankings!
- Incorporate distance more effectively
- Use cosine similarity with all dogs!
- Collaborative Filtering for item and user-based
  - Use surprise library possibly
  - Add timestamp to rankings so we can be time-sensitive in terms of reccomendations