<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>📜 Introduction : </font></h3>
    
In today's world, making the right choice among millions of books can be a challenging experience for readers. However, this process can be facilitated through book recommendation systems, which provide personalized suggestions to readers. This book dataset analysis focuses on developing book recommendations using the Collaborative Filtering approach. The methods employed include Item-Based and User-Based Collaborative Filtering, along with Model-Based approaches. This study aims to enhance the reading experience by offering recommendations based on the preferences of other users with similar interests, thus making literature more accessible and providing readers with a personalized experience in the world of books.

<a id = "1"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Item-Based Recommendation System✨</p>

In [1]:
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

In [2]:
rating = pd.read_csv("/kaggle/input/goodbooks-10k/ratings.csv")
books = pd.read_csv("/kaggle/input/goodbooks-10k/books.csv", 
                 usecols=["book_id",
                          "original_publication_year",
                          "average_rating",
                          "title",
                          "average_rating"])

In [3]:
books.head()

Unnamed: 0,book_id,original_publication_year,title,average_rating
0,2767052,2008.0,"The Hunger Games (The Hunger Games, #1)",4.34
1,3,1997.0,Harry Potter and the Sorcerer's Stone (Harry P...,4.44
2,41865,2005.0,"Twilight (Twilight, #1)",3.57
3,2657,1960.0,To Kill a Mockingbird,4.25
4,4671,1925.0,The Great Gatsby,3.89


In [4]:
rating.head()

Unnamed: 0,book_id,user_id,rating
0,1,314,5
1,1,439,3
2,1,588,5
3,1,1169,4
4,1,1185,4


In [5]:
df = pd.merge(books,rating, how="inner", on="book_id")

In [6]:
df.shape

(79701, 6)

In [7]:
user_df = df.groupby(["user_id","title"])["rating"].mean().unstack().notnull()
user_df

title,'Salem's Lot,"'Tis (Frank McCourt, #2)",1421: The Year China Discovered America,1776,1984,A Bend in the River,A Bend in the Road,A Brief History of Time,A Briefer History of Time,A Case of Need,...,"Women in Love (Brangwen Family, #2)",World War Z: An Oral History of the Zombie War,"World Without End (The Kingsbridge Series, #2)",Wuthering Heights,"Xenocide (Ender's Saga, #3)",Year of Wonders,You Shall Know Our Velocity!,Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values,Zodiac,number9dream
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
9,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53419,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
53420,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
53422,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
53423,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [8]:
# we take random book name from out dataset
 
sample_name = pd.Series(user_df.columns).sample(1, random_state = 42).values[0]

sample_name

'Heidi'

In [9]:
# We take the other bookworms votes if they give rate to Heidi.

sample = user_df[sample_name]

In [10]:
sample

user_id
2        False
3        False
4        False
7        False
9        False
         ...  
53419    False
53420    False
53422    False
53423    False
53424    False
Name: Heidi, Length: 28906, dtype: bool

In [11]:
# Most correlation with Heidi book, to suggest book readers.

user_df.corrwith(sample).sort_values(ascending=False).head(10)

title
Heidi                                                           1.000000
Harry Potter Collection (Harry Potter, #1-6)                    0.528368
Harry Potter and the Order of the Phoenix (Harry Potter, #5)    0.518334
Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)     0.488230
Harry Potter and the Half-Blood Prince (Harry Potter, #6)       0.488230
Heretics of Dune (Dune Chronicles #5)                           0.478195
Harry Potter Boxed Set, Books 1-5 (Harry Potter, #1-5)          0.478195
The Lord of the Rings (The Lord of the Rings, #1-3)             0.468160
Notes from a Small Island                                       0.460460
Harry Potter and the Sorcerer's Stone (Harry Potter, #1)        0.448091
dtype: float64

<center><img src="https://i.imgur.com/Y2DRcty.jpg" width="800" height="800"></center>

<a id = "2"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨User-Based Recommendation System✨</p>

In [12]:
user_df = df.groupby(["user_id","title"])["rating"].mean().unstack()

In [13]:
random_user = user_df.sample(1,random_state=689).index[0]

In [14]:
random_user_df = user_df[user_df.index == random_user]
random_user_df

title,'Salem's Lot,"'Tis (Frank McCourt, #2)",1421: The Year China Discovered America,1776,1984,A Bend in the River,A Bend in the Road,A Brief History of Time,A Briefer History of Time,A Case of Need,...,"Women in Love (Brangwen Family, #2)",World War Z: An Oral History of the Zombie War,"World Without End (The Kingsbridge Series, #2)",Wuthering Heights,"Xenocide (Ender's Saga, #3)",Year of Wonders,You Shall Know Our Velocity!,Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values,Zodiac,number9dream
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
20467,,,,,,,,,,,...,,,,,,,,,,


<a id = "3"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Other Users Watching the Same Movies</div>

In [15]:
book_read = random_user_df.dropna(axis=1).columns.tolist()
book_read

['Burmese Days',
 'Daniel Deronda',
 'Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (Freakonomics, #1)',
 'Harry Potter and the Half-Blood Prince (Harry Potter, #6)',
 'Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)',
 'Heidi',
 'I am Charlotte Simmons',
 'Me Talk Pretty One Day',
 'Quicksilver (The Baroque Cycle, #1)',
 'The 158-Pound Marriage',
 'The Broken Wings',
 'The Corrections',
 'The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory',
 'The Fellowship of the Ring (The Lord of the Rings, #1)',
 "The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy, #1)",
 'The Known World',
 'The Long Dark Tea-Time of the Soul (Dirk Gently, #2)',
 'The Lord of the Rings (The Lord of the Rings, #1-3)',
 'The Lord of the Rings: The Art of The Fellowship of the Ring',
 'The Phantom Tollbooth',
 'Tropic of Cancer']

<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; font-size:100%; text-align:left">

<h3 align="left"><font color='#6B8BA0'>🗨️ Comment: </font></h3>

We are checking whether the randomly selected reader has been read by other readers.

In [16]:
book_read_df = user_df[book_read]
book_read_df

title,Burmese Days,Daniel Deronda,"Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (Freakonomics, #1)","Harry Potter and the Half-Blood Prince (Harry Potter, #6)","Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)",Heidi,I am Charlotte Simmons,Me Talk Pretty One Day,"Quicksilver (The Baroque Cycle, #1)",The 158-Pound Marriage,...,The Corrections,"The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory","The Fellowship of the Ring (The Lord of the Rings, #1)","The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy, #1)",The Known World,"The Long Dark Tea-Time of the Soul (Dirk Gently, #2)","The Lord of the Rings (The Lord of the Rings, #1-3)",The Lord of the Rings: The Art of The Fellowship of the Ring,The Phantom Tollbooth,Tropic of Cancer
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53419,,,,,,,,,,,...,,,,,,,,,,
53420,,,,,,,,,,,...,,,,,,,,,,
53422,,,,,,,,,,,...,,,,,,,,,,
53423,,,,,,,,,,,...,,,,,,,,,,


In [17]:
# how many same books readen by other readers
user_book_count = book_read_df.T.notnull().sum()
user_book_count.max()

21

In [18]:
# Reader IDs of people who read books with more than 30% similarity.
users_same_books = user_book_count[user_book_count > (book_read_df.shape[1] * 30 ) / 100].index
users_same_books

Index([  588,  1952,  5461,  6342,  9246, 10111, 10727, 10944, 11692, 11927,
       11945, 12381, 13544, 13794, 17984, 18031, 18361, 20467, 22602, 23576,
       23612, 24326, 26244, 26661, 28767, 29703, 32592, 32748, 32923, 33065,
       33716, 33872, 38080, 42404, 44397, 45554, 47800, 48559, 48687, 51166],
      dtype='int64', name='user_id')

<a id = "4"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Determination of Similarity</div>

In [19]:
filted_df = book_read_df[book_read_df.index.isin(users_same_books)]
filted_df.head()

title,Burmese Days,Daniel Deronda,"Freakonomics: A Rogue Economist Explores the Hidden Side of Everything (Freakonomics, #1)","Harry Potter and the Half-Blood Prince (Harry Potter, #6)","Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)",Heidi,I am Charlotte Simmons,Me Talk Pretty One Day,"Quicksilver (The Baroque Cycle, #1)",The 158-Pound Marriage,...,The Corrections,"The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory","The Fellowship of the Ring (The Lord of the Rings, #1)","The Hitchhiker's Guide to the Galaxy (Hitchhiker's Guide to the Galaxy, #1)",The Known World,"The Long Dark Tea-Time of the Soul (Dirk Gently, #2)","The Lord of the Rings (The Lord of the Rings, #1-3)",The Lord of the Rings: The Art of The Fellowship of the Ring,The Phantom Tollbooth,Tropic of Cancer
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
588,,,,5.0,,,2.0,,,,...,,,,4.0,4.0,,3.0,,4.0,4.0
1952,,4.0,,,5.0,4.0,,,,,...,,,,5.0,4.0,,4.0,5.0,4.0,
5461,,4.0,4.0,3.0,5.0,5.0,4.0,,,,...,,,,4.0,,4.0,4.0,,,
6342,,4.0,4.0,,5.0,,,,,,...,,,,5.0,5.0,,4.0,4.0,,
9246,,,,1.0,3.0,,3.0,,,,...,,,,5.0,5.0,3.0,4.0,2.0,5.0,


In [20]:
corr_df = filted_df.T.corr().unstack().drop_duplicates()
corr_df

user_id  user_id
588      588        1.000000
         1952       0.333333
         5461      -0.774597
         6342       1.000000
         9246      -0.161165
                      ...   
44397    51166      0.482382
45554    48559      0.157243
         51166     -0.904534
47800    48559      0.321288
48559    51166     -0.166667
Length: 582, dtype: float64

In [21]:
top_readers = pd.DataFrame(corr_df[random_user][corr_df[random_user] > 0.70], columns=["corr"])
top_readers

Unnamed: 0_level_0,corr
user_id,Unnamed: 1_level_1
23612,0.871044
26661,0.782624
33716,0.744157


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; font-size:100%; text-align:left">

<h3 align="left"><font color='#6B8BA0'>🗨️ Comment: </font></h3>
    
Readers that have a correlation of more than 65% with the specified reader.

<a id = "5"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Score Calculation</div>

In [22]:
top_readers_ratings = pd.merge(top_readers, df[["user_id", "book_id", "rating"]], how='inner', on="user_id")
top_readers_ratings

Unnamed: 0,user_id,corr,book_id,rating
0,23612,0.871044,3,3
1,23612,0.871044,34,1
2,23612,0.871044,2,5
3,23612,0.871044,968,3
4,23612,0.871044,1,4
...,...,...,...,...
72,33716,0.744157,36,3
73,33716,0.744157,647,3
74,33716,0.744157,304,5
75,33716,0.744157,6670,3


In [23]:
# We weighted top readers.
top_readers_ratings['weighted_rating'] = top_readers_ratings['corr'] * top_readers_ratings['rating']
top_readers_ratings

Unnamed: 0,user_id,corr,book_id,rating,weighted_rating
0,23612,0.871044,3,3,2.613133
1,23612,0.871044,34,1,0.871044
2,23612,0.871044,2,5,4.355222
3,23612,0.871044,968,3,2.613133
4,23612,0.871044,1,4,3.484178
...,...,...,...,...,...
72,33716,0.744157,36,3,2.232472
73,33716,0.744157,647,3,2.232472
74,33716,0.744157,304,5,3.720787
75,33716,0.744157,6670,3,2.232472


In [24]:
recommendation_df = top_readers_ratings.pivot_table(values="weighted_rating", index="book_id", aggfunc="mean")
recommendation_df

Unnamed: 0_level_0,weighted_rating
book_id,Unnamed: 1_level_1
1,3.445153
2,4.038004
3,2.422803
5,3.444874
8,3.444874
10,3.307336
11,3.706028
13,3.665926
21,4.355222
24,4.355222


In [25]:
books_recommend = recommendation_df[recommendation_df["weighted_rating"] > 3.5].sort_values(by="weighted_rating", ascending=False).head()
books_recommend

Unnamed: 0_level_0,weighted_rating
book_id,Unnamed: 1_level_1
21,4.355222
24,4.355222
25,4.355222
27,4.355222
1274,4.355222


In [26]:
books[books["book_id"].isin(books_recommend.index)]

Unnamed: 0,book_id,original_publication_year,title,average_rating
373,21,2003.0,A Short History of Nearly Everything,4.19
572,1274,1998.0,"Men Are from Mars, Women Are from Venus",3.52
1459,24,2000.0,In a Sunburned Country,4.05
1975,25,1998.0,I'm a Stranger Here Myself: Notes on Returning...,3.89
2278,27,1991.0,Neither Here nor There: Travels in Europe,3.88


<center><img src="https://i.imgur.com/d9BAbkF.png" width="800" height="800"></center>

<a id = "6"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Model-Based Recommendation System✨</p>

In [27]:
import pandas as pd
from surprise import Reader, SVD, Dataset, accuracy
from surprise.model_selection import GridSearchCV, train_test_split, cross_validate
pd.set_option('display.max_columns', None)

In [28]:
df.head()

Unnamed: 0,book_id,original_publication_year,title,average_rating,user_id,rating
0,3,1997.0,Harry Potter and the Sorcerer's Stone (Harry P...,4.44,314,3
1,3,1997.0,Harry Potter and the Sorcerer's Stone (Harry P...,4.44,588,1
2,3,1997.0,Harry Potter and the Sorcerer's Stone (Harry P...,4.44,2077,2
3,3,1997.0,Harry Potter and the Sorcerer's Stone (Harry P...,4.44,2487,3
4,3,1997.0,Harry Potter and the Sorcerer's Stone (Harry P...,4.44,2900,3


In [29]:
# got 1 sample
user_id = df["user_id"].sample(1,random_state=42).values.tolist()[0]
user_id

45029

In [30]:
# books that our sample readed
sample_df = df[df["user_id"] == user_id]
sample_df

Unnamed: 0,book_id,original_publication_year,title,average_rating,user_id,rating
5311,2165,1952.0,The Old Man and the Sea,3.73,45029,4
7552,5359,1993.0,The Client,3.97,45029,4
19233,2373,1997.0,"The Bone Collector (Lincoln Rhyme, #1)",4.18,45029,4
61309,4630,1937.0,To Have and Have Not,3.57,45029,4
62878,5548,1988.0,What Do You Care What Other People Think?,4.28,45029,4


In [31]:
reader = Reader(rating_scale=(1, 5))

In [32]:
# created data before modelling
data = Dataset.load_from_df(df[['user_id',
                                       'book_id',
                                       'rating']], reader)

In [33]:
# building model
trainset, testset = train_test_split(data, test_size=.25, random_state = 42)
svd_model = SVD(random_state = 42)
svd_model.fit(trainset)
predictions = svd_model.test(testset)

In [34]:
accuracy.rmse(predictions)

RMSE: 0.9115


0.9114649631121562

In [35]:
df["book_id"][~(df["user_id"]==45029)]

0           3
1           3
2           3
3           3
4           3
         ... 
79696    8914
79697    8914
79698    8914
79699    8914
79700    8914
Name: book_id, Length: 79696, dtype: int64

In [36]:
didnt_read = df["book_id"][~(df["user_id"]==user_id)].drop_duplicates().values.tolist()

In [37]:
# Function that recommends the book to a user who hasn't read it but gets a high score from our machine learning model
def suggest(df,user_id,sug):
    didnt_read = df["book_id"][~(df["user_id"]==user_id)].drop_duplicates().values.tolist()
    temp_dict={}
    for i in didnt_read:
        temp_dict[i] = svd_model.predict(uid=user_id, iid=i)[3]
    suggestions = pd.DataFrame(temp_dict.items(),columns=["book_id",'possible_rate']).sort_values(by="possible_rate", ascending=False).head(sug)
    merged = pd.merge(suggestions,books[["book_id","title"]], how="inner", on="book_id")
    return merged

In [38]:
suggest(df,user_id,5)

Unnamed: 0,book_id,possible_rate,title
0,9566,4.823831,Still Life with Woodpecker
1,9531,4.760682,Peter and the Shadow Thieves (Peter and the St...
2,3885,4.718368,The Taste of Home Cookbook
3,4708,4.709676,The Beautiful and Damned
4,6423,4.66215,"High Five (Stephanie Plum, #5)"


<div style="border-radius:10px; border:#6B8BA0 solid; padding: 15px; background-color: #F2EADF; font-size:100%; text-align:left">

<h3 align="left"><font color='#6B8BA0'>🗨️ Comment: </font></h3>
    
We can suggest this books to our sample reader

<center><img src="https://i.imgur.com/TKcovIp.png" width="800" height="800"></center>