# Capstone : Food Recommender

----------

## Contents:

1. [Importing Libraries](#1.-Importing-Libraries)
2. [Importing Data](#2.-Importing-Data)
3. [Merging Data](#3.-Merging-Data)
4. [Modelling](#4.-Modelling)
5. [Similar Restaurants](#5.-Similar-Restaurants)
6. [Conclusions](#6.-Conclusions)
7. [Future Works](#7.-Future Works)
8. [References](#8.-References)

## Part 4
Hybrid Model Restaurants Recommender

---

## 1. Importing Libraries

In [41]:
## Importing Basic Packages
import numpy as np
import pandas as pd
import itertools
import matplotlib.pyplot as plt
import seaborn as sns
import operator
import pickle

## Importing LightFM Packages
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm import cross_validation
from lightfm.evaluation import auc_score
from lightfm.evaluation import precision_at_k as lightfm_prec_at_k
from lightfm.evaluation import recall_at_k as lightfm_recall_at_k
from recommenders.models.lightfm.lightfm_utils import similar_items

pd.set_option('display.max_columns', None)

## 2. Importing Data

### 2.1 Central Bussiness Details Data

In [42]:
# Read in central pickle file
central = pd.read_pickle('../data/central_cluster.pkl')

# Check the first 5 row of central dataframe
central.head()

Unnamed: 0,id,alias,name,is_closed,review,categories,restaurants_rating,price,address,zip_code,country,lat,long,planning_area,cluster
0,vVqxGrqt5ALxQjJGnntpKQ,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4
1,7pVbUENiUjg6u6BWKAnxgA,holycrab-singapore,Holycrab,False,17,"Singaporean, Chinese, Seafood",4.5,Expensive,"2 Tan Quee Lan St, #01-03, Singapore 188091, S...",188091,SG,1.29807,103.85697,DOWNTOWN CORE,9
2,6LgTc7CZlXCd1Pxq3BiVcw,jumbo-seafood-singapore-4,Jumbo Seafood,False,182,Seafood,4.0,Expensive,"20 Upper Circular Road, #B1-48, Singapore 0584...",58416,SG,1.288929,103.848374,SINGAPORE RIVER,17
3,wyDfBs1tYSIiBO7HCPKNTg,two-men-bagel-house-singapore,Two Men Bagel House,False,76,"Bagels, Breakfast & Brunch, Sandwiches",4.5,Inexpensive,"16 Enggor St, #01-12, Singapore 079717, Singapore",79717,SG,1.274531,103.844383,DOWNTOWN CORE,4
4,xsaHJx_tkVj1RArC2Fr3PA,sungei-road-laksa-singapore-2,Sungei Road Laksa,False,70,"Singaporean, Chinese, Noodles",4.5,Inexpensive,"27 Jalan Berseh, #01-100, Jin Shui Kopitiam, S...",200027,SG,1.306734,103.857772,ROCHOR,5


### 2.2 Central Bussiness Review Data

In [43]:
# Read in reviews csv file
reviews = pd.read_csv('../data/reviews_cleaned.csv')

# Check the first 5 row of reviews dataframe
reviews.head()

Unnamed: 0,url,username,userid,businessid,comment_text,comment_date,user_rating,neg,neu,pos,compound,compute_score
0,https://www.yelp.com/biz/burnt-ends-singapore?...,Nik T.,2jTqpqHAQnIzIfwHdaOiqQ,vVqxGrqt5ALxQjJGnntpKQ,"used to loathe them, now i love them!what's ch...",2020-07-12,5,0.038,0.82,0.142,0.9862,pos
1,https://www.yelp.com/biz/burnt-ends-singapore?...,Vincent Q.,i-PdP5aXeLGG-Bmg1Wd0KQ,vVqxGrqt5ALxQjJGnntpKQ,Burnt ends is one of the toughest places to ge...,2019-11-22,5,0.066,0.771,0.162,0.9937,pos
2,https://www.yelp.com/biz/burnt-ends-singapore?...,Jonathan C.,cbzM6kE426dOaTvj9NPqww,vVqxGrqt5ALxQjJGnntpKQ,It was a night that I will never forget. This ...,2019-08-14,5,0.004,0.793,0.202,0.9928,pos
3,https://www.yelp.com/biz/burnt-ends-singapore?...,Miguel E.,ztVCqx-qqF1HMxl5McgUzw,vVqxGrqt5ALxQjJGnntpKQ,I get the hype with the experimental drinks an...,2019-10-31,3,0.048,0.827,0.125,0.8625,pos
4,https://www.yelp.com/biz/burnt-ends-singapore?...,Lisa I.,-g3XIcCb2b-BD0QBCcq2Sw,vVqxGrqt5ALxQjJGnntpKQ,TAKE ALL MY MONEY! After a 16 hour flight to S...,2019-12-13,5,0.021,0.908,0.071,0.9031,pos


## 3. Merging Data

In [44]:
# Combine central & reviews dataframe
df = central.merge(reviews, how = 'inner', left_on = 'id', right_on = 'businessid')

# Check the first 5 row of combine dataframe
df.head()

Unnamed: 0,id,alias,name,is_closed,review,categories,restaurants_rating,price,address,zip_code,country,lat,long,planning_area,cluster,url,username,userid,businessid,comment_text,comment_date,user_rating,neg,neu,pos,compound,compute_score
0,vVqxGrqt5ALxQjJGnntpKQ,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Nik T.,2jTqpqHAQnIzIfwHdaOiqQ,vVqxGrqt5ALxQjJGnntpKQ,"used to loathe them, now i love them!what's ch...",2020-07-12,5,0.038,0.82,0.142,0.9862,pos
1,vVqxGrqt5ALxQjJGnntpKQ,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Vincent Q.,i-PdP5aXeLGG-Bmg1Wd0KQ,vVqxGrqt5ALxQjJGnntpKQ,Burnt ends is one of the toughest places to ge...,2019-11-22,5,0.066,0.771,0.162,0.9937,pos
2,vVqxGrqt5ALxQjJGnntpKQ,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Jonathan C.,cbzM6kE426dOaTvj9NPqww,vVqxGrqt5ALxQjJGnntpKQ,It was a night that I will never forget. This ...,2019-08-14,5,0.004,0.793,0.202,0.9928,pos
3,vVqxGrqt5ALxQjJGnntpKQ,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Miguel E.,ztVCqx-qqF1HMxl5McgUzw,vVqxGrqt5ALxQjJGnntpKQ,I get the hype with the experimental drinks an...,2019-10-31,3,0.048,0.827,0.125,0.8625,pos
4,vVqxGrqt5ALxQjJGnntpKQ,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Lisa I.,-g3XIcCb2b-BD0QBCcq2Sw,vVqxGrqt5ALxQjJGnntpKQ,TAKE ALL MY MONEY! After a 16 hour flight to S...,2019-12-13,5,0.021,0.908,0.071,0.9031,pos


In [45]:
# Drop column id since is a repetition of businessid
df.drop(columns = 'id', inplace = True)

# Check the first 5 row of combine dataframe
df.head()

Unnamed: 0,alias,name,is_closed,review,categories,restaurants_rating,price,address,zip_code,country,lat,long,planning_area,cluster,url,username,userid,businessid,comment_text,comment_date,user_rating,neg,neu,pos,compound,compute_score
0,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Nik T.,2jTqpqHAQnIzIfwHdaOiqQ,vVqxGrqt5ALxQjJGnntpKQ,"used to loathe them, now i love them!what's ch...",2020-07-12,5,0.038,0.82,0.142,0.9862,pos
1,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Vincent Q.,i-PdP5aXeLGG-Bmg1Wd0KQ,vVqxGrqt5ALxQjJGnntpKQ,Burnt ends is one of the toughest places to ge...,2019-11-22,5,0.066,0.771,0.162,0.9937,pos
2,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Jonathan C.,cbzM6kE426dOaTvj9NPqww,vVqxGrqt5ALxQjJGnntpKQ,It was a night that I will never forget. This ...,2019-08-14,5,0.004,0.793,0.202,0.9928,pos
3,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Miguel E.,ztVCqx-qqF1HMxl5McgUzw,vVqxGrqt5ALxQjJGnntpKQ,I get the hype with the experimental drinks an...,2019-10-31,3,0.048,0.827,0.125,0.8625,pos
4,burnt-ends-singapore,Burnt Ends,False,71,"Australian, Steakhouses, Barbeque",4.5,Expensive,"20 Teck Lim Rd, Singapore 088391, Singapore",88391,SG,1.28056,103.84175,OUTRAM,4,https://www.yelp.com/biz/burnt-ends-singapore?...,Lisa I.,-g3XIcCb2b-BD0QBCcq2Sw,vVqxGrqt5ALxQjJGnntpKQ,TAKE ALL MY MONEY! After a 16 hour flight to S...,2019-12-13,5,0.021,0.908,0.071,0.9031,pos


## 4. Modelling

### 4.1 Prepare Data and Features

#### a) Data

In [46]:
# Filtering for Selected Columns
data = df[['userid', 'businessid', 'user_rating', 'categories']]

# Check the first 5 row of filtered dataframe
data.head()

Unnamed: 0,userid,businessid,user_rating,categories
0,2jTqpqHAQnIzIfwHdaOiqQ,vVqxGrqt5ALxQjJGnntpKQ,5,"Australian, Steakhouses, Barbeque"
1,i-PdP5aXeLGG-Bmg1Wd0KQ,vVqxGrqt5ALxQjJGnntpKQ,5,"Australian, Steakhouses, Barbeque"
2,cbzM6kE426dOaTvj9NPqww,vVqxGrqt5ALxQjJGnntpKQ,5,"Australian, Steakhouses, Barbeque"
3,ztVCqx-qqF1HMxl5McgUzw,vVqxGrqt5ALxQjJGnntpKQ,3,"Australian, Steakhouses, Barbeque"
4,-g3XIcCb2b-BD0QBCcq2Sw,vVqxGrqt5ALxQjJGnntpKQ,5,"Australian, Steakhouses, Barbeque"


In [47]:
# Create a dataframe with restaurant id, name, address, price, lat & long to merge with our prediction later later

restaurant_name = central[['id','name', 'address', 'price', 'lat', 'long', 'cluster']]
restaurant_name.reset_index(drop = True, inplace = True)

#### b) Item features

In [48]:
# split the restaurants categories based on the comma

restaurants_category = [x.split(', ') for x in data['categories']]

In [49]:
# retrieve the all the unique restaurants categories in the data
all_restaurants_category = sorted(list(set(itertools.chain.from_iterable(restaurants_category))))

# quick look at the all the restaurant category within the data 
all_restaurants_category

['',
 'American (Traditional)',
 'Arts & Entertainment',
 'Asian Fusion',
 'Australian',
 'Bagels',
 'Bakeries',
 'Barbeque',
 'Bars',
 'Beach Bars',
 'Beer',
 'Beer Bar',
 'Bikes',
 'Bistros',
 'Brasseries',
 'Brazilian',
 'Breakfast & Brunch',
 'Breweries',
 'British',
 'Bubble Tea',
 'Buffets',
 'Burgers',
 'Burmese',
 'Cafes',
 'Cantonese',
 'Caribbean',
 'Chicken Shop',
 'Chicken Wings',
 'Chinese',
 'Chocolatiers & Shops',
 'Cocktail Bars',
 'Coffee & Tea',
 'Coffee Roasteries',
 'Congee',
 'Creperies',
 'Cupcakes',
 'Dance Clubs',
 'Delis',
 'Desserts',
 'Dim Sum',
 'Diners',
 'Do-It-Yourself Food',
 'Dumplings',
 'Farmers Market',
 'Fast Food',
 'Filipino',
 'Food',
 'Food Court',
 'Food Delivery Services',
 'Food Stands',
 'French',
 'Furniture Stores',
 'Fuzhou',
 'Gastropubs',
 'Gelato',
 'German',
 'Gluten-Free',
 'Greek',
 'Hainan',
 'Halal',
 'Hawker Centre',
 'Health Markets',
 'Henghwa',
 'Himalayan/Nepalese',
 'Hokkien',
 'Home Decor',
 'Hot Pot',
 'Ice Cream & Frozen 

### 4.2 LightFM model (A pure collaborative filtering model)

Collaborative Filtering is the most common technique used when it comes to building intelligent recommender systems that can learn to give better recommendations as more information about users is collected. Most websites like Amazon, YouTube, and Netflix use collaborative filtering as a part of their sophisticated recommendation systems. 

Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users. It works by searching a large group of people and finding a smaller set of users with tastes similar to a particular user. It looks at the items they like and combines them to create a ranked list of suggestions. 

#### a) Defining variables

In [50]:
# model learning rate
learning_rate = 0.25

# no of latent factors
no_components = 20

# no of epochs to fit model
no_epochs = 20

# no of threads to fit model
no_threads = 32

# regularisation for item features
item_alpha = 1e-6


#### b) Fit the dataset

In [51]:
# Create instance Dataset to holds interaction matrix
dataset = Dataset()

# Use fit method to create users/restaurants id mappings
dataset.fit(users = data['userid'], items = data['businessid'])

# Number of unique user & item in the data
num_users, num_restaurant = dataset.interactions_shape()
print(f'Number of users : {num_users}, Number of restaurants : {num_restaurant}.')

Number of users : 5482, Number of restaurants : 1150.


In [52]:
# Build interaction matrix
(interactions, weight) = dataset.build_interactions([(x['userid'],
                                                       x['businessid'],
                                                       x['user_rating']) for index,x in data.iterrows()])
                                                    
# Train - Test Split
# LightLM expects the train and test sets to have same dimension
# Conventional train test split will not work 
# LightFM package has included the cross_validation.random_train_test_split method to split the interaction data and splits it into two disjoint training and test sets
                                                 
train_interactions, test_interactions = cross_validation.random_train_test_split(interactions, test_percentage = 0.25, random_state = np.random.RandomState(42))

In [53]:
print(f"Shape of train interactions: {train_interactions.shape}")
print(f"Shape of test interactions: {test_interactions.shape}")

Shape of train interactions: (5482, 1150)
Shape of test interactions: (5482, 1150)


#### c) Fit the Model

In [54]:
# Fit the model

model = LightFM(loss = 'warp', no_components = no_components, learning_rate = learning_rate, random_state = np.random.RandomState(42))

%time model = model.fit(interactions = train_interactions, epochs = no_epochs, num_threads = no_threads)

Wall time: 710 ms


#### d) Model Evaluation

In [55]:
# Train & Test AUC Score
train_auc_cf = auc_score(model, train_interactions, num_threads = no_threads).mean()
test_auc_cf = auc_score(model, test_interactions, num_threads = no_threads).mean()
    
# Test Precision & Recall Score
precision_cf = lightfm_prec_at_k(model, test_interactions, train_interactions, k = 10).mean()
recall_cf = lightfm_recall_at_k(model, test_interactions, train_interactions, k = 10).mean()

print(
    "\n------ Using LightFM evaluation methods ------",
    f"Precision@K:\t{precision_cf:.6f}",
    f"Recall@K:\t{recall_cf:.6f}", 
    f"Collaborative filtering training set AUC:\t{train_auc_cf:.6f}", 
    f"Collaborative filtering testing set AUC:\t{test_auc_cf:.6f}", 
    sep='\n')


------ Using LightFM evaluation methods ------
Precision@K:	0.007491
Recall@K:	0.057937
Collaborative filtering training set AUC:	0.955444
Collaborative filtering testing set AUC:	0.600403


### 4.3 Hybrid LightFM Model

LightFm hybrid is a special kind of recommender that uses both collaborative and content based filtering for making recommendations. Thats make hybrid recommender a very speacial and useful method for building recommendation system. 

LightFM is a hybrid matrix factorisation model representing users and items as linear combinations of their content features’ latent factors. The model outperforms both collaborative and content-based models in cold-start or sparse interaction data scenarios (using both user and item metadata), and performs at least as well as a pure collaborative matrix factorisation model where interaction data is abundant. In LightFM, like in a collaborative filtering model, users and items are represented as latent vectors (embeddings). However, just as in a CB model, these are entirely defined by functions (in this case, linear combinations) of embeddings of the content features that describe each product or user.

#### a) Fit the dataset

In [56]:
# Create instance Dataset to holds interaction matrix
dataset2 = Dataset()

# Use fit method to create users/restaurants id mappings
dataset2.fit(users = data['userid'], items = data['businessid'], item_features = all_restaurants_category)

# Number of unique user & item in the data
num_users, num_restaurant = dataset2.interactions_shape()
print(f'Number of users : {num_users}, Number of restaurants : {num_restaurant}.')

Number of users : 5482, Number of restaurants : 1150.


In [57]:
# Convert restaurants category into item matrix using build_item_features method

item_features = dataset2.build_item_features((x,y) for x,y in zip(data['businessid'], restaurants_category))

In [58]:
item_features.todense().shape

(1150, 1293)

In [59]:
# Build interaction matrix

(interactions2, weight) = dataset2.build_interactions([(x['userid'],
                                                       x['businessid'],
                                                       x['user_rating']) for index,x in data.iterrows()])
                                                    
# Train - Test Split
# LightLM expects the train and test sets to have same dimension
# Conventional train test split will not work 
# LightFM package has included the cross_validation.random_train_test_split method to split the interaction data and splits it into two disjoint training and test sets
                                                    
train_interactions2, test_interactions2 = cross_validation.random_train_test_split(interactions2, test_percentage = 0.25, random_state = np.random.RandomState(42))

In [60]:
print(f"Shape of train interactions: {train_interactions2.shape}")
print(f"Shape of test interactions: {test_interactions2.shape}")

Shape of train interactions: (5482, 1150)
Shape of test interactions: (5482, 1150)


#### b) Fit the Model

In [61]:
# Fit the model

model2 = LightFM(loss = 'warp', no_components = no_components, item_alpha = item_alpha, learning_rate = learning_rate, random_state = np.random.RandomState(42))

%time model2 = model2.fit(interactions = train_interactions2, item_features = item_features, epochs = no_epochs, num_threads = no_threads)

Wall time: 1.19 s


#### c) Model Evaluation

In [62]:
# Train & Test AUC Score
train_auc_lfmh = auc_score(model2, train_interactions2, item_features = item_features).mean()
test_auc_lfmh = auc_score(model2, test_interactions2, item_features = item_features).mean()

# Test Precision & Recall Score
precision_lfmh = lightfm_prec_at_k(model2, test_interactions2, train_interactions2, item_features = item_features, k = 10).mean()
recall_lfmh = lightfm_recall_at_k(model2, test_interactions2, train_interactions2, item_features = item_features, k = 10).mean()

print(
    "\n------ Using LightFM evaluation methods with categories ------",
    f"Precision@K:\t{precision_lfmh:.6f}",
    f"Recall@K:\t{recall_lfmh:.6f}", 
    f"Hybrid LightFM training set AUC:\t{train_auc_lfmh:.6f}", 
    f"Hybrid LightFM filtering testing set AUC:\t{test_auc_lfmh:.6f}", 
    sep='\n')


------ Using LightFM evaluation methods with categories ------
Precision@K:	0.009329
Recall@K:	0.059526
Hybrid LightFM training set AUC:	0.988701
Hybrid LightFM filtering testing set AUC:	0.635251


<b>Observation<b/> : 

1. The Hybrid LightFM model which used both implicit and explicit data performed better than the Pure Collaborative Filtering model which used only the explicit data, thus highlighting the benefits of including such additional features to the model.
2. Recall at k is the proportion of relevant items found in the top-k recommendations. We computed recall at 10. It can be observed that Hybrid LightFM  model recall is 5.9%. This means that 5.9% of the total number of the relevant items appear in the top-k results. Comparing to Pure Collaborative Filtering model which recall is 5.7%.
3. AUC measures the likelihood that a random relevant item is ranked higher than a random irrelevant item. Higher AUC score meaning a better recommendation system. Based on AUC score above, Hybrid LightFM AUC score (63.5%) perform better than the Pure Collaborative Filtering model(60%). 
4. Thus, Hybrid model will go through hyperparameter tuning.

### 4.3 Hybrid LightFM Model Hyperparameter Tuning

#### a) Defining variables

In [63]:
# model learning rate
learning_rate_tuning = [0.01, 0.05]

# no of latent factors
no_components_tuning = [3, 5]

# no of epochs to fit model
no_epochs = 200

# no of threads to fit model
no_threads = 32

# regularisation for item features
item_alpha_tuning = [1e-8, 1e-7]

# model learning type 
learning_schedule = ['adagrad', 'adadelta']

#### b) Hyperparameter Tuning

In [64]:
# Search over the following above values of hyperparameters using loop and find the best combination

para_tuning_auc = {}
para_tuning_precision = {}
para_tuning_recall = {}

for lr_tuning in learning_rate_tuning:
    for alpha_tuning in item_alpha_tuning:
        for component_tuning in no_components_tuning:
            for lr_schedule in learning_schedule:
                model = LightFM(loss='warp', no_components = component_tuning,
                                learning_rate = lr_tuning, item_alpha = alpha_tuning,
                                learning_schedule = lr_schedule, random_state=np.random.RandomState(42))
            
            model.fit(interactions = train_interactions2, item_features = item_features, epochs = no_epochs, num_threads = no_threads)
            eval_precision_lfmh = lightfm_prec_at_k(model, test_interactions2, train_interactions2, item_features = item_features, k = 10).mean()
            eval_recall_lfmh = lightfm_recall_at_k(model, test_interactions2, train_interactions2, item_features = item_features, k = 10).mean()
            eval_auc_lfmh = auc_score(model, test_interactions2, item_features = item_features).mean()
            
            para_tuning_precision[lr_tuning, alpha_tuning, component_tuning, lr_schedule] = eval_precision_lfmh
            para_tuning_recall[lr_tuning, alpha_tuning, component_tuning, lr_schedule] = eval_recall_lfmh
            para_tuning_auc[lr_tuning, alpha_tuning, component_tuning, lr_schedule] = eval_auc_lfmh

In [65]:
# Get the best hyperparameter combination which yield the highest AUC score

print(max(para_tuning_auc.items(), key=operator.itemgetter(1))[0])
print(f"auc = {para_tuning_auc[max(para_tuning_auc.items(), key=operator.itemgetter(1))[0]]}")

(0.01, 1e-08, 5, 'adadelta')
auc = 0.6726745367050171


In [66]:
# Get the best hyperparameter combination which yield the highest Precision score

print(max(para_tuning_precision.items(), key=operator.itemgetter(1))[0])
print(f"precision = {para_tuning_precision[max(para_tuning_precision.items(), key=operator.itemgetter(1))[0]]}")

(0.01, 1e-07, 3, 'adadelta')
precision = 0.01461397111415863


In [67]:
# Get the best hyperparameter combination which yield the highest Recall score

print(max(para_tuning_recall.items(), key=operator.itemgetter(1))[0])
print(f"recall = {para_tuning_recall[max(para_tuning_recall.items(), key=operator.itemgetter(1))[0]]}")

(0.01, 1e-07, 3, 'adadelta')
recall = 0.09280876170382933


<b>Observation<b/> : 

1. The AUC score only improve 2%. In this case, we choose a hyperparameter combination which give us the best recall as recall represent the proportion of relevant items found in the top-k recommendations. 

#### c) Tuned Hyperparameter

In [68]:
# model learning rate
learning_rate = 0.01

# no of latent factors
no_components = 3

# no of epochs to fit model
no_epochs = 1000

# no of threads to fit model
no_threads = 32

# regularisation for item features
item_alpha = 1e-07

learning_schedule = 'adadelta'

#### d) Fit the Model

In [69]:
# Fit the model

model_tuned = LightFM(loss = 'warp', no_components = no_components, item_alpha = item_alpha, learning_rate = learning_rate, learning_schedule = learning_schedule, random_state = np.random.RandomState(42))

%time model_tuned = model_tuned.fit(interactions = train_interactions2, item_features = item_features, epochs = no_epochs, num_threads = no_threads)

Wall time: 28.6 s


#### e) Model Evaluation

In [70]:
# Train & Test AUC Score
train_auc = auc_score(model_tuned, train_interactions2, item_features = item_features).mean()
test_auc = auc_score(model_tuned, test_interactions2, item_features = item_features).mean()

# Test Precision & Recall Score
precision_lfmh_tuned = lightfm_prec_at_k(model_tuned, test_interactions2, train_interactions2, item_features = item_features, k = 10).mean()
recall_lfmh_tuned = lightfm_recall_at_k(model_tuned, test_interactions2, train_interactions2, item_features = item_features, k = 10).mean()

print(
    "\n------ Using LightFM evaluation methods with categories ------",
    f"Precision@K:\t{precision_lfmh_tuned:.6f}",
    f"Recall@K:\t{recall_lfmh_tuned:.6f}", 
    f"Hybrid LightFM training set AUC:\t{train_auc:.6f}", 
    f"Hybrid LightFM filtering testing set AUC:\t{test_auc:.6f}", 
    sep='\n')


------ Using LightFM evaluation methods with categories ------
Precision@K:	0.016498
Recall@K:	0.104202
Hybrid LightFM training set AUC:	0.959065
Hybrid LightFM filtering testing set AUC:	0.672701


### 4.4 Hybrid LightFM Model Production

#### a) Tuned Hyperparameter

In [71]:
# model learning rate
learning_rate = 0.01

# no of latent factors
no_components = 3

# no of epochs to fit model
no_epochs = 1000

# no of threads to fit model
no_threads = 32

# regularisation for item features
item_alpha = 1e-07

learning_schedule = 'adadelta'

#### b) Fit the Model

In [72]:
final_model = LightFM(loss = 'warp', no_components = no_components, item_alpha = item_alpha, learning_rate = learning_rate, learning_schedule = learning_schedule, random_state = np.random.RandomState(42))

%time final_model = final_model.fit(interactions = interactions2, item_features = item_features, epochs = no_epochs, num_threads = no_threads)

Wall time: 37.9 s


In [73]:
# Mapping between internal and external representation of the user and item

uid_map2, ufeature_map2, iid_map2, ifeature_map2 = dataset2.mapping()

#### c) Create a Dataframe of Mapped Business ID & Merged with Restaurant Dataframe

In [74]:
# Create a dataframe of mapped business id

business_id = pd.DataFrame(list(iid_map2.items()), columns = ['businessid', 'itemID'])
business_id.head()

Unnamed: 0,businessid,itemID
0,vVqxGrqt5ALxQjJGnntpKQ,0
1,7pVbUENiUjg6u6BWKAnxgA,1
2,6LgTc7CZlXCd1Pxq3BiVcw,2
3,wyDfBs1tYSIiBO7HCPKNTg,3
4,xsaHJx_tkVj1RArC2Fr3PA,4


In [75]:
# Combine mapped business id dataframe & restaurant details dataframe

restaurant_name = restaurant_name.merge(business_id, left_on = 'id', right_on= 'businessid')
restaurant_name.drop(columns = 'id', inplace = True)
restaurant_name

Unnamed: 0,name,address,price,lat,long,cluster,businessid,itemID
0,Burnt Ends,"20 Teck Lim Rd, Singapore 088391, Singapore",Expensive,1.280560,103.841750,4,vVqxGrqt5ALxQjJGnntpKQ,0
1,Holycrab,"2 Tan Quee Lan St, #01-03, Singapore 188091, S...",Expensive,1.298070,103.856970,9,7pVbUENiUjg6u6BWKAnxgA,1
2,Jumbo Seafood,"20 Upper Circular Road, #B1-48, Singapore 0584...",Expensive,1.288929,103.848374,17,6LgTc7CZlXCd1Pxq3BiVcw,2
3,Two Men Bagel House,"16 Enggor St, #01-12, Singapore 079717, Singapore",Inexpensive,1.274531,103.844383,4,wyDfBs1tYSIiBO7HCPKNTg,3
4,Sungei Road Laksa,"27 Jalan Berseh, #01-100, Jin Shui Kopitiam, S...",Inexpensive,1.306734,103.857772,5,xsaHJx_tkVj1RArC2Fr3PA,4
...,...,...,...,...,...,...,...,...
1145,Shabu Sai Orchard Central,"181 Orchard Road, #08-09/10/11, Singapore 2388...",Moderate,1.300928,103.839931,0,IVzQtMzGGflTH4n9N_YvxA,1145
1146,Whampoa Keng,"556 Balestier Rd, Singapore 329872, Singapore",Expensive,1.326790,103.844920,18,V4NE-dClEwighjCU4g58Bw,1146
1147,People's Park Food Centre,"32 New Market Rd, Singapore 050032, Singapore",Inexpensive,1.285075,103.842549,17,H4zyYcdM2whAPfc5P2LetQ,1147
1148,Guksu,"3 Temasek Boulevard, #02-385, Singapore 038983...",Inexpensive,1.293201,103.857066,9,MFUO1UE0r-ydCZGiELh8zA,1148


## 5. Similar Restaurants

### 5.1 Item Affinity

In [76]:
# Retrieve item-item affinity 

_, item_embeddings = final_model.get_item_representations(features = item_features)
item_embeddings

array([[ 2.9087484 ,  1.9355083 , -0.41116494],
       [ 0.12587544, -2.3022    ,  3.8375108 ],
       [ 0.8413675 , -1.7755299 ,  3.4813526 ],
       ...,
       [-2.741607  , -1.0637835 ,  2.6924846 ],
       [ 3.3791013 , -1.9223272 ,  1.1922656 ],
       [ 0.3594883 , -3.7643352 ,  3.347769  ]], dtype=float32)

#### a) Case Study : Burnt Ends (Item ID : 0)

In [77]:
# Retrieve similar items sorted by score

results = similar_items(item_id = 0, item_features = item_features, model = final_model, N = len(restaurant_name)-1)

results

Unnamed: 0,itemID,score
0,196,0.998415
1,34,0.996499
2,868,0.992598
3,141,0.992256
4,1039,0.990780
...,...,...
1144,720,-0.956911
1145,665,-0.957381
1146,719,-0.968152
1147,426,-0.978138


In [78]:
# Merge the results dataframe with restaurants detail dataframe 
# Display top 10 results

results = results.merge(restaurant_name, how = 'inner', on = 'itemID')

results.head(10)

Unnamed: 0,itemID,score,name,address,price,lat,long,cluster,businessid
0,196,0.998415,Sabio By The Sea,"31 Ocean Way, #01-02, Southern Islands 098367,...",Expensive,1.24777,103.841972,6,z4cZR4JeMWvlXMtxirWBGg
1,34,0.996499,My Little Tapas Bar,"14 Ann Siang Road, Singapore 069694, Singapore",Expensive,1.281252,103.845539,4,fHO_A_zh85w1co791aKbaQ
2,868,0.992598,Stellar at 1-Altitude,"1 Raffles Place, #62, Singapore 048616, Singapore",Expensive,1.28433,103.85103,1,Zr0XTW-R1l5Hulk78_kC_Q
3,141,0.992256,Maison Ikkoku Restaurant & Bar,"1st Floor, 20 Kandahar St, Singapore 198885, S...",Moderate,1.302261,103.859612,14,J9rMt_V1NX49rU3YUjh_7Q
4,1039,0.99078,Don Quijote,"7 Dempsey Rd, #01-02, Singapore 249671, Singapore",Expensive,1.304618,103.808905,13,8yXIsscmSeUOn4UJ0pTtEw
5,664,0.989375,Marche Bar and Bistro,"252 North Bridge Road, #01-17A Raffles City Sh...",Moderate,1.29383,103.853332,9,nOqTSIeZoM-vRN-42PNagg
6,878,0.97872,JiBiru Craft Beer Bar,"313 Orchard Rd, #01-26, Singapore 238895, Sing...",Expensive,1.30105,103.838409,0,IUY-BllJGubN2vugqV7Fwg
7,445,0.975437,Lolla,"22 Ann Siang Road, Singapore 069702, Singapore",Expensive,1.28099,103.845573,4,x44o1_Dw4QsQZRtBJgwZaA
8,700,0.973136,Bitters & Love,"118 Telok Ayer St, Singapore 068587, Singapore",Expensive,1.28194,103.848328,4,RtgEHe_SCqZmjgX_DWJT3Q
9,180,0.970455,The Black Swan,"19 Cecil St, Singapore 049704, Singapore",Expensive,1.282523,103.850623,1,OLP_CJcooPYzXPlNGPPTXQ


In [79]:
# We can filter the result by price range

results[results['price'] == 'Inexpensive'].head(10)

Unnamed: 0,itemID,score,name,address,price,lat,long,cluster,businessid
56,40,0.897676,Mrs Pho,"349 Beach Rd, Singapore 199570, Singapore",Inexpensive,1.301422,103.861702,14,GGMka0ITgk7w_YUHUo8gqg
59,67,0.894545,Once Upon A Thyme,"10 Sinaran Drive, Square 2 #B1-124, Singapore ...",Inexpensive,1.321164,103.844692,18,ru4NwxMVGt9Tl0KcKiwPeg
86,109,0.862156,Astons Specialities,"180 Kitchener Rd, #04-14/16, Singapore 208539,...",Inexpensive,1.311263,103.856871,5,6E9iKBmWxBkfpicguzYEcw
112,1038,0.827538,NamNam Noodle Bar,"501 Orchard Rd, #B2-02, Singapore 238880, Sing...",Inexpensive,1.304952,103.831096,16,hfP5UJGaCQBEaBx-uabkDA
119,286,0.815209,Mos Cafe,"252 North Bridge Rd, #B1-38, Raffles City Shop...",Inexpensive,1.29383,103.85333,9,xNdlNb2A9ycXtGUrgvYIfA
126,568,0.803794,Tanjong Rhu Pau,"Blk 7 Jalan Batu, #01-113, Singapore 431007, S...",Inexpensive,1.30236,103.883583,12,pozfrkQ6Z_CWOM-luvZwLA
153,1031,0.77101,Namnam Noodle Bar,"68 Orchard Road, #01-55, Plaza Singapura, Sing...",Inexpensive,1.299695,103.845411,21,Pod_oVjja3Hg4ULdO6DxDQ
154,1071,0.769863,Bami Express,"240 Tanjong Pagar Road, 02-53, Singapore 08854...",Inexpensive,1.275047,103.842957,4,okCFL6lsVsOHAxHaxbENug
171,209,0.739139,French Ladle,"2 Pandan Valley, #01-206, Singapore 597626, Si...",Inexpensive,1.320431,103.778992,2,JIjKuhPdH3lSYbFP-4PV_Q
174,843,0.735161,Coco Ichibanya,"1 Vista Exchange Green, #02-06 The Star Vista,...",Inexpensive,1.30712,103.788452,2,cms6QxhbYoEcQNGjvvbqsA


In [80]:
# Save the rstraurant  to pickle file
restaurant_name.to_pickle('../data/restaurant.pkl')

# Save the item features to pickle file
pickle.dump(item_features, open("../data/item_feature.pkl", "wb"))

# Save the hybrid model to pickle file
pickle.dump(final_model, open("../data/lightfm_model.pkl", "wb"))

## 6. Conclusions

|Model|Train AUC Score|Test AUC Score|Precision (k = 10)|Recall (k = 10)|Remarks|
|---|---|---|---|---|---|
|Pure Collaborative Filtering|0.955444|0.600403|0.007491|0.057937|Without Hyperparameter Tuning| 
|Hybrid LightFM|0.988701|0.635251|0.009329|0.059526|Without Hyperparameter Tuning|
|Hybrid LightFM|0.953348|0.673842|0.015671|0.098008|With Hyperparameter Tuning|

Types of recommendation system:

1. **Collaborative Filtering:** Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person. One long-standing challenge for Collaborative filtering based recommendation methods is the cold start problem, i.e., to provide recommendations for new users or items who have no historical interaction record. The cold start problem is common in real world applications. 

2. **Content-Based Filtering:** These filtering methods are based on the description of an item and a profile of the user’s preferred choices. In a content-based recommendation system, keywords are used to describe the items; besides, a user profile is built to state the type of item this user likes. In other words, the algorithms try to recommend products which are similar to the ones that a user has liked in the past. The idea of content-based filtering is that if you like an item you will also like a ‘similar’ item. 

3. **Hybrid:** Hybrid recommender system is a special type of recommender system that combines both content and collaborative filtering method. Combining collaborative filtering and content-based filtering could be more effective in some cases. Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach (and vice versa). 


Types of Data for building recommendation systems: 

1. **Explicit feedback:** Explicit feedback is the data about user explicit feedback(ratings etc) about a product. It tells directly that users like a product or not.

2. **Implicit feedback:** In implicit feedback, we don't have the data about how the user rates a product. Examples for implicit feedback are clicks, watched movies, played songs, purchases or assigned tags.

As we can see, Hybrid LightFM model test AUC score, precision score and recall score are much better than a Pure Collaborative Filtering model. It demonstrate that Hybrid LightFM methods provide more accurate recommendations than Pure Collaborative Filtering model approach. In conclusion, the Similar Restaurant Recommendation System provide individualized recommendations based on personalized.

Lastly, to solve the user cold-start problem, a Location-Based Recommendation System using the K-Means Clustering Algorithm is build which takes into account a user's location when he is using the app to recommend the top 10 nearby restaurants based on location proximity. The Location-Based Recommendation System aim to provide a quick and dirty service for passing users.


## 7. Future Works

1. Scale the recommender to include all subzone area.
2. Use deep learning algorithms to predict more accurate recommendations.
3. Incorporate Graph Theory for location-based recommender to optimize travelling routes.

## 8. References

1. https://towardsdatascience.com/sentiment-analysis-vader-or-textblob-ff25514ac540
2. https://github.com/microsoft/recommenders/blob/main/examples/02_model_hybrid/lightfm_deep_dive.ipynb
3. https://nycdatascience.com/blog/student-works/yelp-recommender-part-1/
4. https://making.lyst.com/lightfm/docs/index.html
