## Knowledge based recommendors 

These recommenders are used for items that are very rarely bought. It is simply impossible to recommend such items based on past purchasing activity or by building a user profile. Take real estate, for instance. Real estate is usually a once-in-a-lifetime purchase for a family. It is not possible to have a history of real estate purchases for existing users to leverage into a collaborative filter, nor is it always feasible to ask a user their real estate purchase history.

Knowledge based recommendors are not the first choice for building restaurant recommendor for Zomato. But since our data was scraped and we do not have individual user rating info and details, we cannot implement collaborative filter which would be the optimal solution for this case.

Let us see which restaurants will make to the top using knowledge based recommendors on this data.

In [1]:
import pandas as pd
import numpy as np
from ast import literal_eval

Reading the cleaned data as a dataframe

In [2]:
df =  pd.read_csv('Zomato_Cincinnati_Restaurant_Clean.csv')

In [3]:
df.head()

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two,ratings,votes
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,"['Burger', 'Bar Food', 'Sandwich']",Moderately_priced,4.7,1038
1,17113965,Nada,Downtown,"['Breakfast', 'Mexican']",Pricey,4.6,933
2,17115298,Zip's Cafe,Hyde Park/Mount Lookout,"['Burger', 'Bar Food', 'Sandwich']",Inexpensive,4.7,694
3,17112031,Arthur's,Hyde Park/Mount Lookout,"['Burger', 'Bar Food', 'Sandwich']",Moderately_priced,4.7,632
4,17116251,Senate,Over-the-Rhine/Mount Auburn,['Bar Food'],Moderately_priced,4.6,596


In [4]:
df.tail()

Unnamed: 0,rest_id,rest_name,locality,cuisines,cost_for_two,ratings,votes
1897,17414944,Papa John's Pizza,Lawrenceburg,"['Fast Food', 'Pizza']",Inexpensive,3.2,1
1898,17414928,Frisch's Big Boy,Lawrenceburg,['American'],Inexpensive,3.2,1
1899,17412859,Dairy Queen,Aurora,"['Fast Food', 'Ice Cream']",Inexpensive,3.0,4
1900,17414931,Gold Star Chili,Lawrenceburg,['Fast Food'],Inexpensive,3.2,4
1901,17418717,Bob O's at the Levee,Lawrenceburg,"['Burger', 'Sandwich', 'Healthy Food']",Inexpensive,3.2,4


In [5]:
df.columns

Index(['rest_id', 'rest_name', 'locality', 'cuisines', 'cost_for_two',
       'ratings', 'votes'],
      dtype='object')

This will be a simple recommendor system that will perform the following tasks. Ask the user for her/his preferences of :
* Locality
* Cusinies
* Budget for restaurant 

Let's convert the cuisines to a list form

In [6]:
type(df['cuisines'].iloc[0])

str

In [7]:
df['cuisines'] = df['cuisines'].apply(literal_eval)

In [8]:
type(df['cuisines'].iloc[0])

list

In [9]:
df=df.reset_index().drop('index',axis=1)

Let's explode the cuisines column. In other words, if a particular restaurant has multiple cuisines, we will create multiple copies of the restaurant, with each movie having one of the cuisines in the list.

In [10]:
#Create a new feature by exploding cuisines
s = df.apply(lambda x: pd.Series(x['cuisines']),axis=1).stack().reset_index(level=1, drop=True)

#Name the new feature as 'cuisine'
s.name = 'cuisine'

#Create a new dataframe cuisine_df which by dropping the old 'cuisines' feature and adding the new 'cuisine'.
cuisine_df = df.drop('cuisines', axis=1).join(s)

#Print the head of the new cuisine_df
cuisine_df.head()

Unnamed: 0,rest_id,rest_name,locality,cost_for_two,ratings,votes,cuisine
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,Moderately_priced,4.7,1038,Burger
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,Moderately_priced,4.7,1038,Bar Food
0,17114993,Terry's Turf Club,Hyde Park/Mount Lookout,Moderately_priced,4.7,1038,Sandwich
1,17113965,Nada,Downtown,Pricey,4.6,933,Breakfast
1,17113965,Nada,Downtown,Pricey,4.6,933,Mexican


In [11]:
cuisine_df['locality'].unique()

array(['Hyde Park/Mount Lookout', 'Downtown',
       'Over-the-Rhine/Mount Auburn', 'Clifton/Avondale',
       'East End/Mount Washington', 'Campbell County', 'Montgomery',
       'Northside', 'Springdale', 'Blue Ash', 'Covington',
       'Price Hill/Fairmount', 'Mariemont', 'Norwood', 'West Chester',
       'Deer Park/Madeira', 'Madisonville', 'Cold Spring/Alexandria',
       'Mason', 'Symmes', 'Loveland', 'Fairfax',
       'Walnut Hills/Mount Adams', 'Oakley/Pleasant Ridge',
       'Crescent Springs/Fort Wright', 'Cheviot/Westwood',
       'Anderson Township', 'Milford',
       'Wyoming/Arlington Heights/Reading', 'Florence', 'Sharonville',
       'Independence', 'Colerain Township', 'Fairfield',
       'College Hill/Mt Healthy', 'Monfort Heights/White Oak',
       'Plaineville', 'Lebanon', 'Golf Manor', 'Withamsville', 'Landen',
       'St Bernard/Elmwood Place', 'Erlanger', 'Oxford',
       'Crestview Hills', 'Newtown', 'Cleves', 'Delhi', 'Union Township',
       'Batavia', 'Hamilt

In [18]:
def find_resturants(df, percentile=0.8):
    #Ask for preferred locality
    print("Input preferred Locality")
    locality = input()
    
     #Ask for preferred budget
    print("Input preferred budget")
    budget = input()
    
    #Ask for preferred cuisine
    print("Input preferred cuisine")
    cuisine = input()
    
    #Define a new rest variable to store the preferred rest. Copy the contents of df to rest
    rest = df.copy()
    
    #Filter based on the condition
    rest = rest[(rest['locality'] == locality) & 
                    (rest['cost_for_two'] == budget) & 
                    (rest['cuisine'] == cuisine)]
    
    #Compute the values of C and m for the filtered rest
    C = rest['ratings'].mean()
    m = rest['votes'].quantile(percentile)
    
    #Only consider restaurants that have higher than m votes. Save this in a new dataframe m_rest
    m_rest = rest.copy().loc[rest['votes'] >= m]
    
    #Calculate score using the weighted avg formula
    m_rest['score'] = m_rest.apply(lambda x: (x['votes']/(x['votes']+m) * x['ratings']) 
                                       + (m/(m+x['ratings']) * C)
                                       ,axis=1)

    #Sort restaurants in descending order of their scores
    m_rest = m_rest.sort_values('score', ascending=False)
    
    return m_rest

In [19]:
find_resturants(cuisine_df).head()

Input preferred Locality
Downtown
Input preferred budget
Moderately_priced
Input preferred cuisine
Sandwich


Unnamed: 0,rest_id,rest_name,locality,cost_for_two,ratings,votes,cuisine,score
33,17112027,Arnold's Bar & Grill,Downtown,Moderately_priced,4.3,319,Sandwich,6.095649
30,17114481,Rock Bottom Brewery,Downtown,Moderately_priced,4.0,305,Sandwich,5.87424


In [20]:
find_resturants(cuisine_df).head()

Input preferred Locality
Downtown
Input preferred budget
Pricey
Input preferred cuisine
Bar Food


Unnamed: 0,rest_id,rest_name,locality,cost_for_two,ratings,votes,cuisine,score
7,17116915,Moerlein Lager House,Downtown,Pricey,4.2,503,Bar Food,6.253535
