# Recommendation System - Movie Recommendation
## This notebook outlines the concepts involved in building a Complete Recommendation System for recommending Movies to users
## Movie Recommender System - A very very very simple Clone of Netflix

**MovieLens dataset** and build a model to **recommend movies** to the end users. This data has been collected by the GroupLens Research Project at the University of Minnesota. 

This dataset consists of:
- **100,000 ratings** (1-5) from **943 users** on **1682 movies**
- Demographic information of the users (age, gender, occupation, etc.)

Dataset: 

### Import the libraries

In [50]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

### Download the dataset

In [2]:
#! wget https://raw.githubusercontent.com/subashgandyer/datasets/main/ml-100k/ml-100k.zip

In [3]:
#! unzip ml-100k.zip

### Load the dataset
### Reading users file
- u.user

- Please check the column names from the readme file
- Pass in column names for each CSV as the column name is not given in the file and read them using pandas
- Use these following columns
    - 'user_id', 'age', 'sex', 'occupation', 'zip_code'

### Display the user data

In [6]:
users

Unnamed: 0,user_id,age,sex,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213
...,...,...,...,...,...
938,939,26,F,student,33319
939,940,32,M,administrator,02215
940,941,20,M,student,97229
941,942,48,F,librarian,78209


### Reading ratings file
- u.data
- Use the following columns
    - 'user_id', 'movie_id', 'rating', 'unix_timestamp'

### Display the Ratings data

### Reading items file
- u.item
- Use the following columns
    - 'movie id', 'movie title' ,'release date','video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
    - 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
    - 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western'

### Display the Items data

### Reading Training and Testing Ratings data
- Training
    - ua.base
- Testing
    - ua.test
- Use the following columns
    - 'user_id', 'movie_id', 'rating', 'unix_timestamp'

### Display the Training and Testing Ratings data

### How many unique users?

### How many unique items / movies?

### Create a User-Item Matrix
- pivot table

In [17]:
user_movies_data

user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,4.0,0.0,0.0,4.0,4.0,0.0,0.0,0.0,4.0,...,2.0,3.0,4.0,0.0,4.0,0.0,0.0,5.0,0.0,0.0
2,3.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
3,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,3.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,4.0,...,5.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
5,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1678,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1679,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1680,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1681,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# 1. Content Filtering

### Data Preparation
Prepare data for Metadata to have a TF-IDF Vectorizer for Content Filtering

In [18]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,0,0,0,0,0,1,0,1,0,0
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
items['metadata'] = ""

In [20]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,metadata
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,1,0,0,
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,


In [21]:
items.Action, type(items.Action)

(0       0
 1       1
 2       0
 3       1
 4       0
        ..
 1677    0
 1678    0
 1679    0
 1680    0
 1681    0
 Name: Action, Length: 1682, dtype: int64,
 pandas.core.series.Series)

In [22]:
def metadata_Action(x):
    if x == 1:
        return "Action"
    else:
        return " "

In [23]:
items['metadata_Action'] = items.Action.apply(metadata_Action)

In [24]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,metadata,metadata_Action
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,,
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,1,0,0,,Action
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,1,0,0,,
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,,Action
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,1,0,0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,,
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,0,0,0,1,0,1,0,0,,
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,0,0,0,1,0,0,0,0,,
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,,


In [25]:
def metadata_Adventure(x):
    if x == 1:
        return " Adventure "
    else:
        return " "
    
items['metadata_Adventure'] = items.Adventure.apply(metadata_Adventure)

In [26]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,metadata,metadata_Action,metadata_Adventure
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,,,
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,1,0,0,,Action,Adventure
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,1,0,0,,,
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,,Action,
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,1,0,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,0,0,0,0,0,0,0,,,
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,0,0,1,0,1,0,0,,,
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,0,0,1,0,0,0,0,,,
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,0,0,0,0,0,0,0,,,


In [27]:
genres = ['Action', 'Adventure',
'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

In [28]:
def metadata_Animation(x):
    if x == 1:
        return " Animation "
    else:
        return " "
items['metadata_Animation'] = items.Animation.apply(metadata_Animation)

In [29]:
def metadata_Childrens(x):
    if x == 1:
        return " Children's "
    else:
        return " "
    
items["metadata_Children's"] = items["Children's"].apply(metadata_Childrens)

In [30]:
def metadata_Comedy(x):
    if x == 1:
        return " Comedy "
    else:
        return " "

items['metadata_Comedy'] = items.Comedy.apply(metadata_Comedy)

In [31]:
def metadata_Crime(x):
    if x == 1:
        return " Crime "
    else:
        return " "
    
items['metadata_Crime'] = items.Crime.apply(metadata_Crime)

In [32]:
def metadata_Documentary(x):
    if x == 1:
        return " Documentary "
    else:
        return " "
    
items['metadata_Documentary'] = items.Documentary.apply(metadata_Documentary)

In [33]:
def metadata_Drama(x):
    if x == 1:
        return " Drama "
    else:
        return " "
    
items['metadata_Drama'] = items.Drama.apply(metadata_Drama)

In [34]:
def metadata_Fantasy(x):
    if x == 1:
        return " Fantasy "
    else:
        return " "
    
items['metadata_Fantasy'] = items.Fantasy.apply(metadata_Fantasy)

In [35]:
def metadata_FilmNoir(x):
    if x == 1:
        return " Film-Noir "
    else:
        return " "
    
items['metadata_Film-Noir'] = items["Film-Noir"].apply(metadata_FilmNoir)

In [36]:
def metadata_Horror(x):
    if x == 1:
        return "Horror "
    else:
        return " "

items['metadata_Horror'] = items.Horror.apply(metadata_Horror)

In [37]:
def metadata_Musical(x):
    if x == 1:
        return " Musical "
    else:
        return " "
    
items['metadata_Musical'] = items.Musical.apply(metadata_Musical)

In [38]:
def metadata_Mystery(x):
    if x == 1:
        return " Mystery "
    else:
        return " "
    
items['metadata_Mystery'] = items.Mystery.apply(metadata_Mystery)

In [39]:
def metadata_Romance(x):
    if x == 1:
        return " Romance "
    else:
        return " "
    
items['metadata_Romance'] = items.Romance.apply(metadata_Romance)

In [40]:
def metadata_SciFi(x):
    if x == 1:
        return " Sci-Fi "
    else:
        return " "
    
items['metadata_Sci-Fi'] = items["Sci-Fi"].apply(metadata_SciFi)

In [41]:
def metadata_Thriller(x):
    if x == 1:
        return " Thriller "
    else:
        return " "
    
items['metadata_Thriller'] = items.Thriller.apply(metadata_Thriller)

In [42]:
def metadata_War(x):
    if x == 1:
        return " War "
    else:
        return " "
    
items['metadata_War'] = items.War.apply(metadata_War)

In [43]:
def metadata_Western(x):
    if x == 1:
        return " Western "
    else:
        return " "
    
items['metadata_Western'] = items.Western.apply(metadata_Western)

In [44]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,metadata_Fantasy,metadata_Film-Noir,metadata_Horror,metadata_Musical,metadata_Mystery,metadata_Romance,metadata_Sci-Fi,metadata_Thriller,metadata_War,metadata_Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,,,,,,,,,,
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,,,,,,,,Thriller,,
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,,,,,,,,Thriller,,
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,,,,,,,,,,
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,,,,,,,,Thriller,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,,,,,,,,,,
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,,,,,,Romance,,Thriller,,
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,,,,,,Romance,,,,
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,,,,,,,,,,


In [45]:
items['full_metadata'] = items[['metadata_Action', 'metadata_Adventure',
'metadata_Animation', 'metadata_Children\'s', 'metadata_Comedy', 'metadata_Crime', 'metadata_Documentary', 'metadata_Drama', 'metadata_Fantasy',
'metadata_Film-Noir', 'metadata_Horror', 'metadata_Musical', 'metadata_Mystery', 'metadata_Romance', 'metadata_Sci-Fi', 'metadata_Thriller', 'metadata_War', 'metadata_Western']].apply(
                                          lambda x: ' '.join(x), axis = 1)
                                
                                

In [46]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,metadata_Film-Noir,metadata_Horror,metadata_Musical,metadata_Mystery,metadata_Romance,metadata_Sci-Fi,metadata_Thriller,metadata_War,metadata_Western,full_metadata
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,,,,,,,,,,Animation Children's Comedy ...
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,,,,,,,Thriller,,,Action Adventure ...
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,,,,,,,Thriller,,,Thriller
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,,,,,,,,,,Action Comedy Drama ...
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,,,,,,,Thriller,,,Crime Drama Thr...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,,,,,,,,,,Drama
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,,,,,Romance,,Thriller,,,Romance Thrille...
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,,,,,Romance,,,,,Drama Romance
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,,,,,,,,,,Comedy


## TF-IDF Vectorizer on Metadata

## 2. Collaborative Filtering
- Use user_movies_data

In [55]:
latent_matrix_2_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,190,191,192,193,194,195,196,197,198,199
Toy Story (1995),61.469396,21.359470,-3.697670,-2.579255,22.229734,15.965819,12.771045,-2.750646,6.317215,0.425687,...,1.347709,-0.184266,-0.640645,-1.549552,-0.059720,3.182577,2.403422,-0.836539,0.390445,-0.230482
GoldenEye (1995),22.537180,1.719990,-13.616243,-0.514688,-6.448353,-2.408713,0.014406,-1.979768,-8.063023,-4.683462,...,-0.240958,0.027552,2.250576,-0.724372,0.502497,-0.395244,1.642864,-0.856053,-1.490967,-0.794249
Four Rooms (1995),12.767067,7.006769,-2.535840,-7.766791,-0.696417,-4.604113,-0.299384,-2.450665,2.594067,-3.198489,...,-0.467804,0.481890,0.368495,0.663168,-0.398236,0.236723,-1.673457,-1.013456,-1.018775,1.014273
Get Shorty (1995),38.407419,-3.195162,-5.761061,-6.843529,-6.046998,-6.969559,-0.931458,3.538108,-5.464838,-6.086707,...,0.230318,-0.398540,0.397797,0.179542,1.253250,2.233004,1.486998,-0.236360,-0.602323,0.237315
Copycat (1995),13.842238,3.748584,-5.764236,-2.849086,-2.476795,-4.737294,-6.224761,0.061703,2.037478,5.640028,...,-0.445169,-0.940128,0.273877,-0.851864,-2.118579,0.682812,-0.451445,1.081720,0.116213,-0.655582
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Mat' i syn (1997),0.009730,0.054860,0.058059,0.075857,-0.094315,-0.020940,0.021501,0.003022,0.021639,-0.005757,...,-0.045739,-0.008132,-0.010650,-0.026992,0.020977,-0.038582,-0.001206,-0.018451,-0.019929,0.000505
B. Monkey (1998),0.029189,0.164580,0.174176,0.227572,-0.282945,-0.062819,0.064502,0.009065,0.064917,-0.017272,...,-0.137216,-0.024395,-0.031949,-0.080977,0.062932,-0.115747,-0.003619,-0.055352,-0.059788,0.001515
Sliding Doors (1998),0.019459,0.109720,0.116117,0.151715,-0.188630,-0.041879,0.043002,0.006043,0.043278,-0.011514,...,-0.091478,-0.016263,-0.021299,-0.053985,0.041955,-0.077165,-0.002413,-0.036901,-0.039859,0.001010
You So Crazy (1994),0.212086,-0.025764,-0.098975,-0.012308,-0.071658,0.018906,-0.086135,0.047577,-0.173649,-0.001894,...,0.139474,0.010305,-0.003008,-0.166552,0.133181,-0.263650,-0.130022,-0.074526,-0.075209,0.358798


### Plot variance expalined to see what latent dimensions to use

### Cosine Similarity

In [57]:
from sklearn.metrics.pairwise import cosine_similarity
# take the latent vectors for a selected movie from both content and collaborative matrixes



# calculate the similartity of this movie with the others in the list


# hybrid score: an average measure of both content and collaborative 


# form a data frame of similar movies 


#sort it on the basis of either: content, collaborative or hybrid, here : content



Unnamed: 0,content,collaborative,hybrid
Aladdin and the King of Thieves (1996),1.0,0.341084,0.670542
"Fox and the Hound, The (1981)",0.936967,0.398674,0.667821
Gumby: The Movie (1995),0.936967,0.120705,0.528836
Land Before Time III: The Time of the Great Giving (1995) (V),0.936967,0.184578,0.560772
"Swan Princess, The (1994)",0.936967,0.141618,0.539292
"Sword in the Stone, The (1963)",0.936967,0.412281,0.674624
Oliver & Company (1988),0.936967,0.237972,0.587469
Balto (1995),0.936967,0.18339,0.560178
"Aristocats, The (1970)",0.936967,0.336412,0.636689
Pinocchio (1940),0.936967,0.441239,0.689103


### Try Collaborative

Unnamed: 0,content,collaborative,hybrid
Star Wars (1977),8.326673e-17,0.760357,0.380179
Independence Day (ID4) (1996),4.302114e-16,0.724174,0.362087
Return of the Jedi (1983),8.326673e-17,0.720482,0.360241
"Rock, The (1996)",1.235123e-15,0.696476,0.348238
Mission: Impossible (1996),3.053113e-16,0.678037,0.339019
Willy Wonka and the Chocolate Factory (1971),0.5168978,0.658802,0.58785
Star Trek: First Contact (1996),1.144917e-16,0.656759,0.328379
Twister (1996),1.235123e-15,0.652484,0.326242
Jerry Maguire (1996),-7.494005e-16,0.649482,0.324741
Fargo (1996),1.859624e-15,0.646801,0.3234


## Hybrid

Unnamed: 0,content,collaborative,hybrid
Aladdin (1992),0.820798,0.615081,0.717939
Pinocchio (1940),0.936967,0.441239,0.689103
"Sword in the Stone, The (1963)",0.936967,0.412281,0.674624
Beauty and the Beast (1991),0.752183,0.592164,0.672174
Aladdin and the King of Thieves (1996),1.0,0.341084,0.670542
"Fox and the Hound, The (1981)",0.936967,0.398674,0.667821
"Lion King, The (1994)",0.752183,0.583445,0.667814
Winnie the Pooh and the Blustery Day (1968),0.936967,0.388311,0.662639
Beavis and Butt-head Do America (1996),0.818945,0.488329,0.653637
"Aristocats, The (1970)",0.936967,0.336412,0.636689


## 3. Hybrid Recommendation System

In [1]:
def recommend_similar_movies(title):
    
    return None

In [2]:
recommend_similar_movies("Toy Story (1995)")

In [3]:
recommend_similar_movies("GoldenEye (1995)")

In [4]:
recommend_similar_movies("Mission: Impossible (1996)")

## Turicreate - Python Library for easy recommendation engine building

In [64]:
#! pip install turicreate

### Import the library

### Create a SFrame for training and testing data

### Popularity Recommender - Cold Start problem

### Recommend some top 5 movies to users 1, 2, 3, 4, 5
- Use Popularity model

### Insights ??

- 
- 

### Item Similarity Recommender
#### Training the model

#### Making recommendations for users 1, 2, 3, 4, 5

### Insights ?
- 
- 

# 4. Matrix Factorization Recommender

- R – The user-movie rating matrix
- K – Number of latent features
- alpha – Learning rate for stochastic gradient descent
- beta – Regularization parameter for bias
- iterations – Number of iterations to perform stochastic gradient descent


In [71]:
class MF():

    # Initializing the user-movie rating matrix, no. of latent features, alpha and beta.
    def __init__(self, R, K, alpha, beta, iterations):
        self.R = R
        self.num_users, self.num_items = R.shape
        self.K = K
        self.alpha = alpha
        self.beta = beta
        self.iterations = iterations

    # Initializing user-feature and movie-feature matrix 
    def train(self):
        self.P = np.random.normal(scale=1./self.K, size=(self.num_users, self.K))
        self.Q = np.random.normal(scale=1./self.K, size=(self.num_items, self.K))

        # Initializing the bias terms
        self.b_u = np.zeros(self.num_users)
        self.b_i = np.zeros(self.num_items)
        self.b = np.mean(self.R[np.where(self.R != 0)])

        # List of training samples
        self.samples = [
        (i, j, self.R[i, j])
        for i in range(self.num_users)
        for j in range(self.num_items)
        if self.R[i, j] > 0
        ]

        # Stochastic gradient descent for given number of iterations
        training_process = []
        for i in range(self.iterations):
            np.random.shuffle(self.samples)
            self.sgd()
            mse = self.mse()
            training_process.append((i, mse))
            if (i+1) % 20 == 0:
                print("Iteration: %d ; error = %.4f" % (i+1, mse))

        return training_process

    # Computing total mean squared error
    def mse(self):
        xs, ys = self.R.nonzero()
        predicted = self.full_matrix()
        error = 0
        for x, y in zip(xs, ys):
            error += pow(self.R[x, y] - predicted[x, y], 2)
        return np.sqrt(error)

    # Stochastic gradient descent to get optimized P and Q matrix
    def sgd(self):
        for i, j, r in self.samples:
            prediction = self.get_rating(i, j)
            e = (r - prediction)

            self.b_u[i] += self.alpha * (e - self.beta * self.b_u[i])
            self.b_i[j] += self.alpha * (e - self.beta * self.b_i[j])

            self.P[i, :] += self.alpha * (e * self.Q[j, :] - self.beta * self.P[i,:])
            self.Q[j, :] += self.alpha * (e * self.P[i, :] - self.beta * self.Q[j,:])

    # Ratings for user i and moive j
    def get_rating(self, i, j):
        prediction = self.b + self.b_u[i] + self.b_i[j] + self.P[i, :].dot(self.Q[j, :].T)
        return prediction

    # Full user-movie rating matrix
    def full_matrix(self):
        return mf.b + mf.b_u[:,np.newaxis] + mf.b_i[np.newaxis:,] + mf.P.dot(mf.Q.T)

In [72]:
R= np.array(ratings.pivot(index = 'user_id', columns ='movie_id', values = 'rating').fillna(0))

In [73]:
mf = MF(R, K=20, alpha=0.001, beta=0.01, iterations=100)
training_process = mf.train()
print()
print("P x Q:")
print(mf.full_matrix())
print()

Iteration: 20 ; error = 296.1428
Iteration: 40 ; error = 291.0942
Iteration: 60 ; error = 287.7374
Iteration: 80 ; error = 282.3564
Iteration: 100 ; error = 273.1316

P x Q:
[[3.92105461 3.36490957 2.9845237  ... 3.27366076 3.4351097  3.43152819]
 [3.94417182 3.30137547 3.16908257 ... 3.37287006 3.51702066 3.49947715]
 [3.3759181  2.7003567  2.61436327 ... 2.78033145 2.90381586 2.90484641]
 ...
 [4.26196805 3.66159871 3.44827559 ... 3.63185631 3.77754697 3.74272589]
 [4.38492385 3.78049366 3.52312126 ... 3.79092683 3.89080227 3.86524256]
 [3.79231176 3.23934483 3.03230513 ... 3.25970117 3.38211532 3.25664447]]



## Surprise

### Import the libraries

In [74]:
from surprise import Dataset, Reader, SVD, accuracy
from surprise.model_selection import train_test_split

In [75]:
ratings

Unnamed: 0,user_id,movie_id,rating,unix_timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596
...,...,...,...,...
99995,880,476,3,880175444
99996,716,204,5,879795543
99997,276,1090,1,874795795
99998,13,225,2,882399156


In [76]:
items

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,metadata_Film-Noir,metadata_Horror,metadata_Musical,metadata_Mystery,metadata_Romance,metadata_Sci-Fi,metadata_Thriller,metadata_War,metadata_Western,full_metadata
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,,,,,,,,,,Animation Children's Comedy ...
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,,,,,,,Thriller,,,Action Adventure ...
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,,,,,,,Thriller,,,Thriller
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,,,,,,,,,,Action Comedy Drama ...
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,,,,,,,Thriller,,,Crime Drama Thr...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,1678,Mat' i syn (1997),06-Feb-1998,,http://us.imdb.com/M/title-exact?Mat%27+i+syn+...,0,0,0,0,0,...,,,,,,,,,,Drama
1678,1679,B. Monkey (1998),06-Feb-1998,,http://us.imdb.com/M/title-exact?B%2E+Monkey+(...,0,0,0,0,0,...,,,,,Romance,,Thriller,,,Romance Thrille...
1679,1680,Sliding Doors (1998),01-Jan-1998,,http://us.imdb.com/Title?Sliding+Doors+(1998),0,0,0,0,0,...,,,,,Romance,,,,,Drama Romance
1680,1681,You So Crazy (1994),01-Jan-1994,,http://us.imdb.com/M/title-exact?You%20So%20Cr...,0,0,0,0,0,...,,,,,,,,,,Comedy


In [77]:
Mapping_file = dict(zip(items.movie_title.tolist(), items.movie_id.tolist()))

### Instantiate a reader and read in our rating data

### Train SVD on 75% of known rates

### Check the accuracy using Root Mean Square Error

In [79]:
def pred_user_rating(ui):
    if ui in ratings.user_id.unique():
        ui_list = ratings[ratings.user_id == ui].movie_id.tolist()
        d = {k: v for k,v in Mapping_file.items() if not v in ui_list}        
        predictedL = []
        for i, j in d.items():     
            predicted = algorithm.predict(ui, j)
            predictedL.append((i, predicted[3])) 
        pdf = pd.DataFrame(predictedL, columns = ['movies', 'ratings'])
        pdf.sort_values('ratings', ascending=False, inplace=True)  
        pdf.set_index('movies', inplace=True)    
        return pdf.head(10)        
    else:
        print("User Id does not exist in the list!")
        return None

Unnamed: 0_level_0,ratings
movies,Unnamed: 1_level_1
Wings of Desire (1987),4.726772
Chinatown (1974),4.655911
"Close Shave, A (1995)",4.652174
Waiting for Guffman (1996),4.61947
Ran (1985),4.591864
Casablanca (1942),4.54701
"Boot, Das (1981)",4.462583
"Third Man, The (1949)",4.462502
"Killing Fields, The (1984)",4.452708
Butch Cassidy and the Sundance Kid (1969),4.442452


In [81]:
pred_user_rating(50)

Unnamed: 0_level_0,ratings
movies,Unnamed: 1_level_1
"Godfather, The (1972)",4.613636
"Silence of the Lambs, The (1991)",4.570185
"Godfather: Part II, The (1974)",4.547845
Raging Bull (1980),4.542233
Casablanca (1942),4.520886
"Wrong Trousers, The (1993)",4.427827
"Close Shave, A (1995)",4.427066
"Manchurian Candidate, The (1962)",4.410475
"Boot, Das (1981)",4.389995
Rear Window (1954),4.385388


In [82]:
pred_user_rating(49)

Unnamed: 0_level_0,ratings
movies,Unnamed: 1_level_1
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963),4.179254
Secrets & Lies (1996),4.080494
12 Angry Men (1957),3.905257
"Boot, Das (1981)",3.847671
"Godfather, The (1972)",3.832041
Rear Window (1954),3.80438
Strictly Ballroom (1992),3.804347
"Close Shave, A (1995)",3.804085
To Kill a Mockingbird (1962),3.788469
Shall We Dance? (1996),3.783408


In [83]:
pred_user_rating(1)

Unnamed: 0_level_0,ratings
movies,Unnamed: 1_level_1
Wings of Desire (1987),4.726772
Chinatown (1974),4.655911
"Close Shave, A (1995)",4.652174
Waiting for Guffman (1996),4.61947
Ran (1985),4.591864
Casablanca (1942),4.54701
"Boot, Das (1981)",4.462583
"Third Man, The (1949)",4.462502
"Killing Fields, The (1984)",4.452708
Butch Cassidy and the Sundance Kid (1969),4.442452


In [84]:
pred_user_rating(915)

Unnamed: 0_level_0,ratings
movies,Unnamed: 1_level_1
"Usual Suspects, The (1995)",4.267505
"Shawshank Redemption, The (1994)",4.247665
Pulp Fiction (1994),4.244646
Wallace & Gromit: The Best of Aardman Animation (1996),4.155187
"Manchurian Candidate, The (1962)",4.127605
Some Folks Call It a Sling Blade (1993),4.126187
"Close Shave, A (1995)",4.118404
Vertigo (1958),4.11563
12 Angry Men (1957),4.108518
"Godfather, The (1972)",4.105377
