## Content Based Restaurant Recommendor System

Content-based recommenders suggest similar items based on a particular item. This system uses item metadata, such as Locality, Cuisine, rating, etc. for restaurants, to make these recommendations. The general idea behind these recommender systems is that if a person liked a particular item, he or she will also like an item that is similar to it. We would recommend restaurants similar to the one searched by the user. Our recommender system is highly dependent on defining an appropriate similarity measure. Eventually, we select a subset of restaurants to display to the user.

We will be working on the locality , cuisine , cost_for_two and rest_name to find similarity between restaurants. Since the restaurant name might not actually be similar for two restaurants as there are quite a few unique names, we will give less weightage to the restaurant name while making the soup and more weightage to locality , cost_for_two and cusines.

In [266]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [238]:
rest =  pd.read_csv('Zomato_Cincinnati_Restaurant_Clean.csv')

In [239]:
rest.head()

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two,ratings,votes
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,"['Burger', 'Bar Food', 'Sandwich']",Moderately_priced,4.7,1038
1,17113965,Nada,Downtown,"['Breakfast', 'Mexican']",Pricey,4.6,933
2,17115298,Zip's Cafe,Hyde Park/Mount Lookout,"['Burger', 'Bar Food', 'Sandwich']",Inexpensive,4.7,694
3,17112031,Arthur's,Hyde Park/Mount Lookout,"['Burger', 'Bar Food', 'Sandwich']",Moderately_priced,4.7,632
4,17116251,Senate,Over-the-Rhine/Mount Auburn,['Bar Food'],Moderately_priced,4.6,596


In [241]:
from ast import literal_eval
rest['cuisines'] = rest['cuisines'].apply(literal_eval)

In [242]:
rest.drop(columns=['votes','ratings'],inplace=True)

In [243]:
rest.head()

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Moderately_priced
1,17113965,Nada,Downtown,"[Breakfast, Mexican]",Pricey
2,17115298,Zip's Cafe,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Inexpensive
3,17112031,Arthur's,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Moderately_priced
4,17116251,Senate,Over-the-Rhine/Mount Auburn,[Bar Food],Moderately_priced


Create a copy of the database. 

In [244]:
restc = rest.copy()

In [245]:
restc.head()

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Moderately_priced
1,17113965,Nada,Downtown,"[Breakfast, Mexican]",Pricey
2,17115298,Zip's Cafe,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Inexpensive
3,17112031,Arthur's,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Moderately_priced
4,17116251,Senate,Over-the-Rhine/Mount Auburn,[Bar Food],Moderately_priced


In [248]:
restc['cuisines']

0       [Burger, Bar Food, Sandwich]            
1       [Breakfast, Mexican]                    
2       [Burger, Bar Food, Sandwich]            
3       [Burger, Bar Food, Sandwich]            
4       [Bar Food]                              
5       [Indian, Pakistani, Vegetarian]         
6       [BBQ, Southern]                         
7       [Burger, Bar Food]                      
8       [Mexican, California]                   
9       [Italian, Pizza]                        
10      [German]                                
11      [BBQ, Bar Food]                         
12      [Sandwich, Vegetarian]                  
13      [Belgian, European, International]      
14      [Sushi, Thai, Vegetarian]               
15      [Southern]                              
16      [Steak]                                 
17      [Desserts, Italian, Pizza]              
18      [French, Italian, Tapas]                
19      [Pizza]                                 
20      [Seafood, Ca

The cuisine data has words like Fast Food and Bar Food. If we find similarity between these two it will consider the food as same in both although they are clearly different entities. So we will strip the spaces between cuisine name. We also need to convert all the text data used to lowercase.

In [249]:
# Function to sanitize data to prevent ambiguity. 
# Removes spaces and converts to lowercase
def sanitize(x):
    if isinstance(x, list):
        #Strip spaces and convert to lowercase
        return [str.lower(i.replace(" ", "")) for i in x]
    else:
        if isinstance(x, str):
            return str.lower(x.replace("'", "").replace("/",""))
        else:
            return ''

In [250]:
for feature in ['rest_name', 'locality', 'cuisines', 'cost_for_two']:
    restc[feature] = restc[feature].apply(sanitize)

In [251]:
type(restc['cuisines'].iloc[0])

list

In [252]:
restc.head()

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two
0,17114993,terrys turf club,hyde parkmount lookout,"[burger, barfood, sandwich]",moderately_priced
1,17113965,nada,downtown,"[breakfast, mexican]",pricey
2,17115298,zips cafe,hyde parkmount lookout,"[burger, barfood, sandwich]",inexpensive
3,17112031,arthurs,hyde parkmount lookout,"[burger, barfood, sandwich]",moderately_priced
4,17116251,senate,over-the-rhinemount auburn,[barfood],moderately_priced


Our model gave as should give much importance to locality and cost_for_two. We will mention it multiple times to give more weight to it.

In [253]:
#Function that creates a soup out of the desired metadata
def create_soup(x):
    return ''.join(x['rest_name']) + ' ' + ''.join(x['locality']) + ' ' + ' '.join(i for i in x['cuisines']) + ' ' + ''.join(x['cost_for_two']) + ' ' + ''.join(x['locality']) + ' ' + ' '.join(x['cuisines']) + ' ' + ''.join(x['cost_for_two']) + ' ' +  ''.join(x['locality'])


In [254]:
restc['soup'] = restc.apply(create_soup, axis=1)

In [255]:
pd.set_option('display.max_colwidth', -1)

In [256]:
restc['soup'].head()

0    terrys turf club hyde parkmount lookout burger barfood sandwich moderately_priced hyde parkmount lookout burger barfood sandwich moderately_priced hyde parkmount lookout
1    nada downtown breakfast mexican pricey downtown breakfast mexican pricey downtown                                                                                        
2    zips cafe hyde parkmount lookout burger barfood sandwich inexpensive hyde parkmount lookout burger barfood sandwich inexpensive hyde parkmount lookout                   
3    arthurs hyde parkmount lookout burger barfood sandwich moderately_priced hyde parkmount lookout burger barfood sandwich moderately_priced hyde parkmount lookout         
4    senate over-the-rhinemount auburn barfood moderately_priced over-the-rhinemount auburn barfood moderately_priced over-the-rhinemount auburn                              
Name: soup, dtype: object

With the soup created we are now in a good place to compute similarity scores and build a content based recommendor system. We will be using CountVectorizer here.

## Steps to create Content Based Recommendor

1. Declare the restaurant name as an argument.
2. Obtain the index of the restaurant name from the indices reverse mapping.
3. Get the list of cosine similarity scores for that particular restaurant with all resturants using cosine_sim. Convert this into a list of tuples where the first element is the position and the second is the similarity score.
4. Sort this list of tuples on the basis of the cosine similarity scores.
5. Get the top 10 elements of this list. Ignore the first element as it refers to the similarity score with itself.
6. Return the restaurant details corresponding to the indices of the top 10 elements, excluding the first.

In [258]:
#Define a new CountVectorizer object and create vectors for the soup
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(restc['soup'])

In [259]:
#Compute the cosine similarity score 
cosine_sim = cosine_similarity(count_matrix, count_matrix)

In [260]:
# Reset index of your df and construct reverse mapping again
restc = restc.reset_index()
# making restaurant name as the index
indices = pd.Series(restc.index, index=restc['rest_name'])

In [264]:
# Function that takes in restaurant name as input and gives recommendations 
def content_recommender(rest_name, cosine_sim, df, indices,df_orig):
    # Obtain the index of the movie that matches the title
    idx = indices[rest_name]

    # Get the pairwsie similarity scores of all restaurants with the input restaurant
    # And convert it into a list of tuples as described above
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the restaurants based on the cosine similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar restaurants. Ignore the first restaurant as it will be the input restaurant itself.
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return df_orig.iloc[movie_indices]

In [265]:
content_recommender('terrys turf club', cosine_sim, restc, indices,rest)

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two
3,17112031,Arthur's,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Moderately_priced
190,17114520,Rusty Bucket Restaurant and Tavern,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Moderately_priced
318,17116611,Dutch's,Hyde Park/Mount Lookout,[Bar Food],Moderately_priced
2,17115298,Zip's Cafe,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Inexpensive
235,17116863,Keystone Bar & Grill,Hyde Park/Mount Lookout,"[Burger, Bar Food, Sandwich]",Inexpensive
310,17117195,Wurst Bar in the Square,Hyde Park/Mount Lookout,[Bar Food],Moderately_priced
630,17112272,Buffalo Wild Wings,Hyde Park/Mount Lookout,[Bar Food],Moderately_priced
135,17116691,Cock & Bull Public House,Hyde Park/Mount Lookout,"[American, Burger]",Moderately_priced
570,17113654,LongHorn Steakhouse,Hyde Park/Mount Lookout,"[Burger, Bar Food, Steak]",Pricey
845,17116005,The Redmoor,Hyde Park/Mount Lookout,[American],Moderately_priced
