# **Recommendation System using Deep learning (Collaborative Based)**

The Dataset used in this notebook contains 278,858 users voted 271,379 books in total 1,149,780 ratings.It has user features like user id,age and their region. The features for each book are isbn, year of publication, publisher, its cover image and so on.

Keras library in Python was used for designing neural network to recommend books to users based on similarities with others.

* **Importing Libraries and defining dataframe**

In [12]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [4]:
df = pd.read_csv('../input/bookcrossing-dataset/Books Data with Category Language and Summary/Preprocessed_data.csv')
df

Unnamed: 0.1,Unnamed: 0,user_id,location,age,isbn,rating,book_title,book_author,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category,city,state,country
0,0,2,"stockton, california, usa",18.0000,0195153448,0,Classical Mythology,Mark P. O. Morford,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science'],stockton,california,usa
1,1,8,"timmins, ontario, canada",34.7439,0002005018,5,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],timmins,ontario,canada
2,2,11400,"ottawa, ontario, canada",49.0000,0002005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,ontario,canada
3,3,11676,"n/a, n/a, n/a",34.7439,0002005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],,,
4,4,41385,"sudbury, ontario, canada",34.7439,0002005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],sudbury,ontario,canada
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1031170,1031170,278851,"dallas, texas, usa",33.0000,0743203763,0,As Hogan Said . . . : The 389 Best Things Anyo...,Randy Voorhees,2000.0,Simon & Schuster,http://images.amazon.com/images/P/0743203763.0...,http://images.amazon.com/images/P/0743203763.0...,http://images.amazon.com/images/P/0743203763.0...,Golf lovers will revel in this collection of t...,en,['Humor'],dallas,texas,usa
1031171,1031171,278851,"dallas, texas, usa",33.0000,0767907566,5,All Elevations Unknown: An Adventure in the He...,Sam Lightner,2001.0,Broadway Books,http://images.amazon.com/images/P/0767907566.0...,http://images.amazon.com/images/P/0767907566.0...,http://images.amazon.com/images/P/0767907566.0...,A daring twist on the travel-adventure genre t...,en,['Nature'],dallas,texas,usa
1031172,1031172,278851,"dallas, texas, usa",33.0000,0884159221,7,Why stop?: A guide to Texas historical roadsid...,Claude Dooley,1985.0,Lone Star Books,http://images.amazon.com/images/P/0884159221.0...,http://images.amazon.com/images/P/0884159221.0...,http://images.amazon.com/images/P/0884159221.0...,9,9,9,dallas,texas,usa
1031173,1031173,278851,"dallas, texas, usa",33.0000,0912333022,7,The Are You Being Served? Stories: 'Camping In...,Jeremy Lloyd,1997.0,Kqed Books,http://images.amazon.com/images/P/0912333022.0...,http://images.amazon.com/images/P/0912333022.0...,http://images.amazon.com/images/P/0912333022.0...,These hilarious stories by the creator of publ...,en,['Fiction'],dallas,texas,usa


*  **Preprocessing**

Eventhough the csv file that I used was preprocessed, I checked null values and dataframe info.

In [6]:
df.columns

Index(['Unnamed: 0', 'user_id', 'location', 'age', 'isbn', 'rating',
       'book_title', 'book_author', 'year_of_publication', 'publisher',
       'img_s', 'img_m', 'img_l', 'Summary', 'Language', 'Category', 'city',
       'state', 'country'],
      dtype='object')

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 982279 entries, 0 to 1031174
Data columns (total 19 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   Unnamed: 0           982279 non-null  int64  
 1   user_id              982279 non-null  int64  
 2   location             982279 non-null  object 
 3   age                  982279 non-null  float64
 4   isbn                 982279 non-null  object 
 5   rating               982279 non-null  int64  
 6   book_title           982279 non-null  object 
 7   book_author          982279 non-null  object 
 8   year_of_publication  982279 non-null  float64
 9   publisher            982279 non-null  object 
 10  img_s                982279 non-null  object 
 11  img_m                982279 non-null  object 
 12  img_l                982279 non-null  object 
 13  Summary              982279 non-null  object 
 14  Language             982279 non-null  object 
 15  Category        

In [5]:
#Dropping records includes NaN values
df.dropna(inplace=True)

In [8]:
#Managing NaN values of dataset
df.isna().sum()

Unnamed: 0             0
user_id                0
location               0
age                    0
isbn                   0
rating                 0
book_title             0
book_author            0
year_of_publication    0
publisher              0
img_s                  0
img_m                  0
img_l                  0
Summary                0
Language               0
Category               0
city                   0
state                  0
country                0
dtype: int64

* **Recommendation System**

the following recommendation system was developed for a subset of this dataset,the users who voted in Iran.It can be expanded to another region or whole of the dataset.The only reason for this approach was to reducing the time of processing.

In [9]:
df_ir = df.loc[(df["country"] == "iran"), ['user_id','isbn','rating']]
df_ir

Unnamed: 0,user_id,isbn,rating
606,125774,0452264464,0
915,18527,0971880107,6
2422,186039,0971880107,0
2582,205871,0971880107,0
3986,125774,0425099148,9
...,...,...,...
983115,205871,0192832328,0
983116,205871,0486647625,10
985276,217895,0373168497,0
993322,217895,0020517505,0


In [10]:
user_ids = df_ir["user_id"].unique().tolist()
user2user_encoded = {x: i for i, x in enumerate(user_ids)}
userencoded2user = {i: x for i, x in enumerate(user_ids)}
book_ids = df_ir["isbn"].unique().tolist()
book2book_encoded = {x: i for i, x in enumerate(book_ids)}
book_encoded2book = {i: x for i, x in enumerate(book_ids)}
df_ir["user"] = df_ir["user_id"].map(user2user_encoded)
df_ir["book"] = df_ir["isbn"].map(book2book_encoded)

num_users = len(user2user_encoded)
num_books = len(book_encoded2book)
df_ir["rating"] = df_ir["rating"].values.astype(np.float32)

# min and max ratings will be used to normalize the ratings
min_rating = min(df_ir["rating"])
max_rating = max(df_ir["rating"])

print("Number of users: {}, Number of Books: {}, Min rating: {}, Max rating: {}".format(num_users, num_books, min_rating, max_rating))


Number of users: 18, Number of Books: 1471, Min rating: 0.0, Max rating: 10.0


In [11]:
df_ir= df_ir.sample(frac=1, random_state=42)
x = df_ir[["user", "book"]].values

# Normalize the targets between 0 and 1 (it's easier to train)
y = df_ir["rating"].apply(lambda x: (x - min_rating) / (max_rating - min_rating)).values

# training and validating on 80%/20%.
x_train = df_ir[['user_id', 'isbn']].values
y = df_ir['rating'].values
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=42)
x_train.shape, x_val.shape, y_train.shape, y_val.shape

((1306, 2), (327, 2), (1306,), (327,))

In [58]:
class RecommenderNet(keras.Model):
    def __init__(self, num_users, num_books, embedding_size, **kwargs):
        super(RecommenderNet, self).__init__(**kwargs)
        self.num_users = num_users
        self.num_books = num_books
        self.embedding_size = embedding_size
        self.user_embedding = layers.Embedding(
            num_users,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.user_bias = layers.Embedding(num_users, 1)
        self.book_embedding = layers.Embedding(
            num_books,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.book_bias = layers.Embedding(num_books, 1)

    def call(self, inputs):
        user_vector = self.user_embedding(inputs[:, 0])
        user_bias = self.user_bias(inputs[:, 0])
        book_vector = self.book_embedding(inputs[:, 1])
        book_bias = self.book_bias(inputs[:, 1])
        dot_user_book = tf.tensordot(user_vector, book_vector, 2)
        # Add all the components (including bias)
        x = dot_user_book + user_bias + book_bias
        # The sigmoid activation forces the rating to between 0 and 1
        return tf.nn.sigmoid(x)

EMBEDDING_SIZE = 50
    
model = RecommenderNet(num_users, num_books, EMBEDDING_SIZE)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(), optimizer=keras.optimizers.Adam(lr=0.000002))

In [59]:
history = model.fit(
    x=x_train,
    y=y_train,
    batch_size=64,
    epochs=5,
    verbose=1,
    validation_data=(x_val, y_val),
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [60]:
book_df = df.loc[(df["country"] == "iran") & (df["Category"] != "9"), ['isbn','book_title','Category']]
book_df

Unnamed: 0,isbn,book_title,Category
606,0452264464,Beloved (Plume Contemporary Fiction),['Fiction']
915,0971880107,Wild Animus,['Fiction']
2422,0971880107,Wild Animus,['Fiction']
2582,0971880107,Wild Animus,['Fiction']
3986,0425099148,Death in the Clouds,['Fiction']
...,...,...,...
965043,1859675352,Dried Flowers: Over 20 Natural Projects for th...,['Gardening']
965046,1887166661,Everything Here Is Mine: An Unhelpful Guide to...,['Humor']
983115,0192832328,On Christian Teaching (World's Classics),['Apologetics']
983116,0486647625,Complex Analysis With Applications,['Mathematics']


In [61]:
# Top recommendations for a random user

user_id = df_ir.user_id.sample(3).iloc[0]
books_read_by_user = df_ir[df_ir.user_id == user_id]
books_not_read = book_df[~book_df["isbn"].isin(books_read_by_user.isbn.values)]["isbn"]
books_not_read = list(set(books_not_read).intersection(set(book2book_encoded.keys())))

books_not_read = [[book2book_encoded.get(x)] for x in books_not_read]

user_encoder = user2user_encoded.get(user_id)

user_book_array = np.hstack(([[user_encoder]] * len(books_not_read), books_not_read))\

ratings = model.predict(user_book_array).flatten()

top_ratings_indices = ratings.argsort()[-10:][::-1]

recommended_book_ids = [book_encoded2book.get(books_not_read[x][0]) for x in top_ratings_indices]

print("Showing recommendations for user: {}".format(user_id))
print("====" * 9)
print("Books with high ratings from user")
print("----" * 8)
top_books_user = ( books_read_by_user.sort_values(by="rating", ascending=False).head(5).isbn.values)
book_df_rows = book_df[book_df["isbn"].isin(top_books_user)]
for row in book_df_rows.itertuples():
    print(row.book_title, ":", row.Category)

print("----" * 8)
print("Top 10 Books recommendations")
print("----" * 8)
recommended_books = book_df[book_df["isbn"].isin(recommended_book_ids)]
for row in recommended_books.itertuples():
    print(row.book_title, ":", row.Category)

Showing recommendations for user: 186039
Books with high ratings from user
--------------------------------
Transform Your Home in a Weekend : ['Interior decoration']
--------------------------------
Top 10 Books recommendations
--------------------------------
The Body in the Library (Miss Marple Mysteries (Paperback)) : ['Fiction']
Harry Potter and the Chamber of Secrets Postcard Book : ['Juvenile Fiction']
Riley in the Morning : ['Fiction']
Bread and Jam for Frances : ['Juvenile Fiction']
Personal Protector (Colby Agency) (Harlequin Intrigue Series, No. 659) : ['Fiction']
Major Attraction (4 Strong Men) (Harlequin Super Romance, No 649) : ['Fiction']
Disney's Mulan Classic Storybook (The Mouse Works Classics Collection) : ['Juvenile Fiction']
The Boy on the Porch : ['Fiction']
Inhuman Beings : ['Fiction']
Green Lion Of Zion Street, The : ['African American children']


In [62]:
#accuracy of this Recommendation System
print('Accuracy: %.2f' % (model.evaluate(x,y)*100))

Accuracy: 93.17
