# Recommender System

Build a recommender system by using cosine simillarties score.

## Steps:

1. Import new data set
    - understand the dataset, look into it. 
    - check data info and null values.
2. Pivoting Dataset
    - Perform matrix factorization and create a suitable new dataset.
3. Filling Sparse Matrix
    - Fill the sparse matrix with suitable variable
4. Cosine Similarity Between Users
    - Find corresponding cosine metrics by pairwise distances
    - make new dataframe and change column row labels
    - fill diagonals with zeros
5. System Analysis
    - Find the top similar users
    - understand and point out how they are similar with an example.
6. Final system
    - Merge two users and check how they are similar and what can be recommended.
9. Conclusion

## Import New Dataset

In [2]:
#load the libraries
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine, correlation
import warnings
warnings.filterwarnings('ignore')

In [3]:
raw_data = pd.read_csv("C:\\Users\\Vignesh R Babu\\excelR-datascience\\assignment_10_RecommenderSys\\book.csv", 
                       encoding='latin-1').drop(columns='Unnamed: 0')
raw_data.rename(columns={'User.ID':'user_id','Book.Title':'book_title','Book.Rating':'book_rating'},inplace=True)
raw_data.head() 

Unnamed: 0,user_id,book_title,book_rating
0,276726,Classical Mythology,5
1,276729,Clara Callan,3
2,276729,Decision in Normandy,6
3,276736,Flu: The Story of the Great Influenza Pandemic...,8
4,276737,The Mummies of Urumchi,6


In [4]:
raw_data.info()
#There are no null values

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   user_id      10000 non-null  int64 
 1   book_title   10000 non-null  object
 2   book_rating  10000 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 234.5+ KB


In [14]:
#number of unique users in the dataset
num_user = raw_data.user_id.unique()
len(num_user)

2182

In [15]:
#No of unique books in the dataset
num_book = raw_data.book_title.unique()
len(num_book)

9659

## Pivoting Dataset

https://pandas-docs.github.io/pandas-docs-travis/user_guide/reshaping.html

In [18]:
user_books_df = raw_data.pivot_table( index = 'user_id',
                               columns='book_title',
                                 values='book_rating')
user_books_df

book_title,"Jason, Madison &amp",Other Stories;Merril;1985;McClelland &amp,Repairing PC Drives &amp,'48,'O Au No Keia: Voices from Hawai'I's Mahu and Transgender Communities,...AND THE HORSE HE RODE IN ON : THE PEOPLE V. KENNETH STARR,01-01-00: A Novel of the Millennium,"1,401 More Things That P*Ss Me Off",10 Commandments Of Dating,"100 Great Fantasy Short, Short Stories",...,Zora Hurston and the Chinaberry Tree (Reading Rainbow Book),\Even Monkeys Fall from Trees\ and Other Japanese Proverbs,\I Won't Learn from You\: And Other Thoughts on Creative Maladjustment,"\More More More,\ Said the Baby",\O\ Is for Outlaw,"\Surely You're Joking, Mr. Feynman!\: Adventures of a Curious Character","\Well, there's your problem\: Cartoons",iI Paradiso Degli Orchi,stardust,Ã?Â?bermorgen.
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,
10,,,,,,,,,,,...,,,,,,,,,,
12,,,,,,,,,,,...,,,,,,,,,,
14,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
278846,,,,,,,,,,,...,,,,,,,,,,
278849,,,,,,,,,,,...,,,,,,,,,,
278851,,,,,,,,,,,...,,,,,,,,7.0,,
278852,,,,,,,,,,,...,,,,,,,,,,


## Filling Sparse Matrix

In [20]:
user_books_df.fillna(0, inplace=True)
user_books_df

book_title,"Jason, Madison &amp",Other Stories;Merril;1985;McClelland &amp,Repairing PC Drives &amp,'48,'O Au No Keia: Voices from Hawai'I's Mahu and Transgender Communities,...AND THE HORSE HE RODE IN ON : THE PEOPLE V. KENNETH STARR,01-01-00: A Novel of the Millennium,"1,401 More Things That P*Ss Me Off",10 Commandments Of Dating,"100 Great Fantasy Short, Short Stories",...,Zora Hurston and the Chinaberry Tree (Reading Rainbow Book),\Even Monkeys Fall from Trees\ and Other Japanese Proverbs,\I Won't Learn from You\: And Other Thoughts on Creative Maladjustment,"\More More More,\ Said the Baby",\O\ Is for Outlaw,"\Surely You're Joking, Mr. Feynman!\: Adventures of a Curious Character","\Well, there's your problem\: Cartoons",iI Paradiso Degli Orchi,stardust,Ã?Â?bermorgen.
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
278846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
278849,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
278851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,0.0
278852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Cosine Similarity Between Users

In [21]:
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine, correlation

In [26]:
cos_sim = 1 - pairwise_distances(user_books_df.values,metric='cosine')
cos_sim

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [27]:
#Store the results in a dataframe
cos_sim_df = pd.DataFrame(cos_sim)

In [30]:
#Set the index and column names to user ids 
cos_sim_df.index = num_user
cos_sim_df.columns = num_user
cos_sim_df

Unnamed: 0,276726,276729,276736,276737,276744,276745,276747,276748,276751,276754,...,162085,162091,162092,162095,162103,162107,162109,162113,162121,162129
276726,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
276729,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
276736,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
276737,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
276744,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
162107,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
162109,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
162113,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
162121,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [35]:
np.fill_diagonal(cos_sim, 0)
cos_sim_df.iloc[0:5, 0:5]

Unnamed: 0,276726,276729,276736,276737,276744
276726,0.0,0.0,0.0,0.0,0.0
276729,0.0,0.0,0.0,0.0,0.0
276736,0.0,0.0,0.0,0.0,0.0
276737,0.0,0.0,0.0,0.0,0.0
276744,0.0,0.0,0.0,0.0,0.0


## System Analysis

In [34]:
#Most Similar Users
cos_sim_df.idxmax(axis=1)[0:20]

276726    276726
276729    276726
276736    276726
276737    276726
276744    276726
276745    276726
276747    276726
276748    161677
276751    276726
276754    276726
276755    276726
276760    276726
276762    276726
276768    276726
276772      1491
276774    278543
276780    276726
276786    276726
276788    276726
276796    276726
dtype: int64

In [37]:
raw_data[(raw_data['user_id']==161677) | (raw_data['user_id']==276748)] #Their ratings are similar

Unnamed: 0,user_id,book_title,book_rating
12,276748,The Middle Stories,6
9190,161677,The Biggest Pumpkin Ever,8
9191,161677,The Twelve Dancing Princesses: A Folk Tale fro...,9
9192,161677,Do You Know?,5
9193,161677,The good-by day (A Little golden book),8
9194,161677,Pooh Trick or Treat! (Little Golden Books),6
9195,161677,Cookie Monster/Cookie Tree,8
9196,161677,My Little Golden Book of Cars and Trucks (Litt...,7
9197,161677,I Think That It is Wonderful: Featuring Jim He...,8
9198,161677,Grover's Own Alphabet,8


In [43]:
user_1= raw_data[raw_data['user_id']==161677]
user_2= raw_data[raw_data['user_id']==278543]

In [44]:
user_2.book_title

2110    Other Voices, Other Rooms (Vintage International)
2111            Rebuilding Coventry: A Tale of Two Cities
2112             All Through the Night (Holiday Classics)
2113         Yeats Is Dead! A Mystery by 15 Irish Writers
Name: book_title, dtype: object

In [45]:
user_1.book_title

9190                             The Biggest Pumpkin Ever
9191    The Twelve Dancing Princesses: A Folk Tale fro...
9192                                         Do You Know?
9193               The good-by day (A Little golden book)
9194           Pooh Trick or Treat! (Little Golden Books)
9195                           Cookie Monster/Cookie Tree
9196    My Little Golden Book of Cars and Trucks (Litt...
9197    I Think That It is Wonderful: Featuring Jim He...
9198                                Grover's Own Alphabet
9199     Best Little Word Book Ever! (Little Golden Book)
9200      Busiest Firefighters Ever! (Little Golden Book)
9201                  The Monster at the End of This Book
9202                                             Kat Kong
9203                             IF YOU'RE AFRAID OF DARK
Name: book_title, dtype: object

## Final System

In [46]:
pd.merge(user_1,user_2,on='book_title',how='outer')

Unnamed: 0,user_id_x,book_title,book_rating_x,user_id_y,book_rating_y
0,161677.0,The Biggest Pumpkin Ever,8.0,,
1,161677.0,The Twelve Dancing Princesses: A Folk Tale fro...,9.0,,
2,161677.0,Do You Know?,5.0,,
3,161677.0,The good-by day (A Little golden book),8.0,,
4,161677.0,Pooh Trick or Treat! (Little Golden Books),6.0,,
5,161677.0,Cookie Monster/Cookie Tree,8.0,,
6,161677.0,My Little Golden Book of Cars and Trucks (Litt...,7.0,,
7,161677.0,I Think That It is Wonderful: Featuring Jim He...,8.0,,
8,161677.0,Grover's Own Alphabet,8.0,,
9,161677.0,Best Little Word Book Ever! (Little Golden Book),10.0,,


## Conclusion
- A very basic recommendation system was built for recommending books to particular users.
- This is very basic and serves no practical application.
- I need to make a scaled model for deployment from this scratch.
- Reference: https://datascienceplus.com/building-a-book-recommender-system-the-basics-knn-and-matrix-factorization/