# Implementing the Model

In [6]:
# Importing Libraires

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [7]:
# Loading the data

df = pd.read_csv('C:/Users/umar/Documents/Machine Learning/Recommender System/Data Analysis/data.csv', index_col=0)
df.shape

df_title = pd.read_csv('C:/Users/umar/Documents/Machine Learning/Recommender System/Data/movie_titles.csv', header = None, names = ['Movie_Id', 'Year', 'Name'])

In [8]:
df.head()

Unnamed: 0,Cust_Id,Movie_Id,Rating,Date
0,1488844,1,3.0,2005-09-06
1,822109,1,5.0,2005-05-13
2,885013,1,4.0,2005-10-19
3,30878,1,4.0,2005-12-26
4,823519,1,3.0,2004-05-03


In [9]:
df.shape

(39967, 4)

In [10]:
# Impoting Surprise Library

from surprise import Reader, Dataset, SVD
from surprise.model_selection.validation import cross_validate

In [11]:
reader = Reader()    # Used to parse a file containing user, item, rating
data = Dataset.load_from_df(df[['Cust_Id', 'Movie_Id', 'Rating']], reader)
svd = SVD()

The data and model for product recommendation are ready, the model can be evaluated using cross-validation.

In [24]:
# Run 5-fold cross-validation
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.1010  1.0943  1.1084  1.1088  1.1092  1.1043  0.0059  
MAE (testset)     0.8845  0.8786  0.8913  0.8952  0.9027  0.8905  0.0084  
Fit time          2.52    2.59    2.59    2.60    2.74    2.61    0.07    
Test time         0.06    0.06    0.06    0.06    0.06    0.06    0.00    


{'test_rmse': array([1.10095127, 1.09433747, 1.10840387, 1.10876494, 1.10920718]),
 'test_mae': array([0.88454936, 0.87855046, 0.89130148, 0.89524996, 0.90274011]),
 'fit_time': (2.5208308696746826,
  2.58854079246521,
  2.5889058113098145,
  2.602389097213745,
  2.737495183944702),
 'test_time': (0.05905413627624512,
  0.060042619705200195,
  0.059040069580078125,
  0.059041500091552734,
  0.060044050216674805)}

#### Conlusion: RMSE= 1.1043 , MAE=0.8905, so we can say our model is working quite accurately.

In [13]:
# Training on whole-dataset
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0xcce7722f70>

## RECOMMENDING MOVIES

In [26]:
# Let's Recommend movies to a user Y (here Y=1488844)

titles = df_title.copy()
titles['Predicted_Rating'] = titles['Movie_Id'].apply(lambda x: svd.predict(1488844, x).est)
titles = titles.sort_values(by=['Predicted_Rating'], ascending=False)
titles.head(10)

Unnamed: 0,Movie_Id,Year,Name,Predicted_Rating
4,5,2004.0,The Rise and Fall of ECW,4.039186
5,6,1997.0,Sick,3.923807
13373,13374,1933.0,Dinner at Eight,3.891943
13376,13377,1963.0,Winter Light,3.884122
2,3,1997.0,Character,3.754032
4505,4506,1961.0,Breakfast at Tiffany's,3.64975
1,2,2004.0,Isle of Man TT 2004 Review,3.609378
13369,13370,2002.0,Justice League: Paradise Lost,3.609206
13374,13375,1963.0,Andy Griffith Show: Classic Favorites,3.569778
13368,13369,2002.0,PRIDE Fighting Championships: Cold Fury 2,3.52653


Conclusion: User 1488844 has been recommended a lots of "Romantic" movies, he might like this genre more than other

### Hyperparameter tuning

In [23]:
from surprise import SVD
from surprise.model_selection import GridSearchCV
param_grid = {'n_epochs': [5, 10, 15], 'lr_all': [0.001, 0.002, 0.005],'reg_all': [0.4, 0.5, 0.6]}
gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=5)
gs.fit(data)

# best RMSE score
print(gs.best_score['rmse'])

# combination of parameters that gave the best RMSE score
print(gs.best_params['rmse'])

1.1033564564981408
{'n_epochs': 15, 'lr_all': 0.005, 'reg_all': 0.4}


In [21]:
algo = gs.best_estimator['rmse']
algo.fit(data.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0xccea1d4e80>

In [25]:
# Let's Recommend movies to a user Y (here Y=1488844)

titles = df_title.copy()
titles['Predicted_Rating'] = titles['Movie_Id'].apply(lambda x: algo.predict(1488844, x).est)
titles = titles.sort_values(by=['Predicted_Rating'], ascending=False)
titles.head(10)

Unnamed: 0,Movie_Id,Year,Name,Predicted_Rating
13374,13375,1963.0,Andy Griffith Show: Classic Favorites,3.705632
4505,4506,1961.0,Breakfast at Tiffany's,3.696319
4,5,2004.0,The Rise and Fall of ECW,3.684469
13377,13378,1940.0,His Girl Friday,3.651687
13373,13374,1933.0,Dinner at Eight,3.63209
0,1,2003.0,Dinosaur Planet,3.624022
13376,13377,1963.0,Winter Light,3.622609
9211,9212,1994.0,Sailor Moon S,3.60731
13369,13370,2002.0,Justice League: Paradise Lost,3.587267
2,3,1997.0,Character,3.553991


CONCLUSION: As we can clearly see the new predictions are a bit different from the previous predictions(improved). 

For instance the movie "Andy Griffith Show: Classic Favorites" is Ranked 9th in previous prediction but Ranked 1st in the new prediction.

Using HyperParameter Tuning increased the accuracy of the model and reduced the RMSE from 1.1043 to 1.1033 which is 0.1% decrease.