# Movie Recommendation

## Pseudocode: 


#### Generate following matrix

Distance matrix [nUser,nUser], distance btw UserA and UserB

Movie Rating matrix [nMovie,nUser], Rating of each movie by each user

Frequency matrix [nMovie], total times rated per movie 


weight matrix= 1 / frequency matrix

For specific user N,

UnweightedRes =  Movie Rating[nMovie,nUser]  .* Distance matrix [nUser,N] --> [nMovie,1]

result = UnweightedRes * weight matrix

return result

In [3]:
# import libraris
import numpy as np
import pandas as pd
import matplotlib as mlp

from scipy.sparse import csr_matrix
from mpl_toolkits.axes_grid1 import make_axes_locatable
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error
import itertools
from sklearn.metrics import silhouette_samples, silhouette_score

from scipy import spatial

%matplotlib inline

# Import data

In [126]:
rating = pd.read_csv('movie.csv')

MovieId=rating.iloc[:,0]
rating=rating.drop(columns=['Unnamed: 0'])
rating.index=MovieId
print(rating.head())
print(rating.shape)
UserId=rating.columns
UserId

              1   2   3   4    5    6    7    8   9  10  ...  601  602  603  \
Unnamed: 0                                               ...                  
1           4.0 NaN NaN NaN  4.0  NaN  4.5  NaN NaN NaN  ...  4.0  NaN  4.0   
3           NaN NaN NaN NaN  NaN  4.0  NaN  4.0 NaN NaN  ...  NaN  4.0  NaN   
6           4.0 NaN NaN NaN  NaN  5.0  NaN  NaN NaN NaN  ...  NaN  NaN  NaN   
47          NaN NaN NaN NaN  NaN  3.0  NaN  NaN NaN NaN  ...  NaN  NaN  NaN   
50          NaN NaN NaN NaN  NaN  5.0  NaN  NaN NaN NaN  ...  NaN  NaN  NaN   

            604  605  606  607  608  609  610  
Unnamed: 0                                     
1           3.0  4.0  2.5  4.0  2.5  3.0  5.0  
3           5.0  3.5  NaN  NaN  2.0  NaN  NaN  
6           NaN  NaN  NaN  NaN  2.0  NaN  NaN  
47          NaN  NaN  NaN  NaN  NaN  NaN  NaN  
50          3.0  NaN  NaN  NaN  NaN  NaN  NaN  

[5 rows x 610 columns]
(9724, 610)


Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10',
       ...
       '601', '602', '603', '604', '605', '606', '607', '608', '609', '610'],
      dtype='object', length=610)

# Calculate top recommended movie

In [127]:
R=rating.fillna(0) #fating matrix
User='1'

In [141]:
def recommendation (rating, person):
    #distance matrix
    cos_dis_matrix=[]
    for a in UserId:
        R1=R.loc[:,person]
        Ra=R.loc[:,a]
        result = 1 - spatial.distance.cosine(R1, Ra)
        cos_dis_matrix.append(result)
    
    Euc_dis_matrix=[]
    for a in UserId:
        R1=R.loc[:,person]
        Ra=R.loc[:,a]
        result = abs(np.linalg.norm(R1 - Ra))
        Euc_dis_matrix.append(result)
  
        
    
    #weight matrix
    freq_matrix=np.count_nonzero(R,axis=1)
    weight_matrix=1/freq_matrix
    
    # UnweightedRes= Movie Rating[nMovie,nUser] * Distance matrix [nUser] --> [nMovie]
    #Unweighted_Score = np.dot(rating,cos_dis_matrix)
    Unweighted_Score = np.dot(rating,Euc_dis_matrix)
    
    #score for each movie
    Score=Unweighted_Score*weight_matrix
    Score=pd.DataFrame(Score)
    Score.index=rating.index
    Score.columns=['Score']
    Score=Score.sort_values(by=['Score'],ascending=False)


    return Score
    

In [142]:
res=recommendation(R,User)
res['movieId']=res.index
res

Unnamed: 0_level_0,Score,movieId
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1
85295,880.308895,85295
4437,880.308895,4437
53280,880.308895,53280
25937,800.171857,25937
4356,800.171857,4356
...,...,...
90374,35.653716,90374
4927,35.653716,4927
5636,35.653716,5636
202,35.192329,202


In [143]:
#get the name of movies
info_df=pd.read_csv('movies.csv')
print(info_df.head())

   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  


In [144]:
#Merge by movieId
recommendation=pd.merge(res,info_df,how='left',on=['movieId'])
print(recommendation.shape)
recommendation

(9724, 4)


Unnamed: 0,Score,movieId,title,genres
0,880.308895,85295,Scooby-Doo! Curse of the Lake Monster (2010),Adventure|Children|Comedy|Mystery
1,880.308895,4437,Suspiria (1977),Horror
2,880.308895,53280,"Breed, The (2006)",Horror|Thriller
3,800.171857,25937,Easter Parade (1948),Musical|Romance
4,800.171857,4356,Gentlemen Prefer Blondes (1953),Comedy|Musical|Romance
...,...,...,...,...
9719,35.653716,90374,Martha Marcy May Marlene (2011),Drama|Thriller
9720,35.653716,4927,"Last Wave, The (1977)",Fantasy|Mystery|Thriller
9721,35.653716,5636,Welcome to Collinwood (2002),Comedy|Crime
9722,35.192329,202,Total Eclipse (1995),Drama|Romance


# Print out the result

In [145]:
print('The top 6 recommended movie for userId {} is\n\n {}'.format(User, recommendation.head(6)))

The top 6 recommended movie for userId 1 is

         Score  movieId                                         title  \
0  880.308895    85295  Scooby-Doo! Curse of the Lake Monster (2010)   
1  880.308895     4437                               Suspiria (1977)   
2  880.308895    53280                             Breed, The (2006)   
3  800.171857    25937                          Easter Parade (1948)   
4  800.171857     4356               Gentlemen Prefer Blondes (1953)   
5  792.278005     2651        Frankenstein Meets the Wolf Man (1943)   

                              genres  
0  Adventure|Children|Comedy|Mystery  
1                             Horror  
2                    Horror|Thriller  
3                    Musical|Romance  
4             Comedy|Musical|Romance  
5                             Horror  
