# Book Recommendation System Using Collaborative Filtering

## Introduction
This notebook demonstrates the development of a book recommendation system using **Collaborative Filtering**. 
The recommendation system is built using the **K-Nearest Neighbors (KNN)** algorithm to find books that are similar based on user ratings.

Collaborative filtering works by identifying users with similar preferences and recommending books that similar users have enjoyed.

### Importing Necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Downloading The Datasets

In [2]:
import gdown
import os

# Function to download a file from Google Drive if it doesn't exist
def download_file(file_id, output):
    # Create the /data/ directory if it doesn't exist
    if not os.path.exists('data'):
        os.makedirs('data')

    # Full path to the file in /data/
    output_path = os.path.join('data', output)

    # Check if the file already exists
    if not os.path.exists(output_path):
        url = f"https://drive.google.com/uc?id={file_id}"
        gdown.download(url, output_path, quiet=False)
        print(f"{output} downloaded.")
    else:
        print(f"{output} already exists.")

# File IDs from Google Drive
books_file = '1U4kz_Y4A9fsnXPleHV_wYJ_yB4XxsJXs'
ratings_file = '1hMgOVMci3iaGLRrUKI-PRMBinPpQIXcP'
users_file = '195Mgo4sKzpJ9vfYqiVg7d_NL4vX4Iexj'

# Download files if they do not already exist in /data/
download_file(books_file, "Books.csv")
download_file(ratings_file, "Ratings.csv")
download_file(users_file, "Users.csv")

Books.csv already exists.
Ratings.csv already exists.
Users.csv already exists.


### Data Preprocessing

In [3]:
books = pd.read_csv('data/Books.csv', low_memory=False)

In [4]:
books.sample(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
259793,312254210,Stein on Writing,Sol Stein,2000,St. Martin's Press,http://images.amazon.com/images/P/0312254210.0...,http://images.amazon.com/images/P/0312254210.0...,http://images.amazon.com/images/P/0312254210.0...
119259,61009571,The Wilderness Road,James Reasoner,1996,Harper Mass Market Paperbacks (Mm),http://images.amazon.com/images/P/0061009571.0...,http://images.amazon.com/images/P/0061009571.0...,http://images.amazon.com/images/P/0061009571.0...
198312,866254064,Frontiersmen (Wild West in American History),Gail Stewart,1990,Rourke Pub Group,http://images.amazon.com/images/P/0866254064.0...,http://images.amazon.com/images/P/0866254064.0...,http://images.amazon.com/images/P/0866254064.0...
115808,395186498,Curious George Takes a Job (Sandpiper Houghton...,H. A. Rey,1974,Houghton Mifflin,http://images.amazon.com/images/P/0395186498.0...,http://images.amazon.com/images/P/0395186498.0...,http://images.amazon.com/images/P/0395186498.0...
246005,404077749,"Aucassin and Nicolette, and Other Mediaeval Ro...",Eugene Mason,1972,Ams Pr,http://images.amazon.com/images/P/0404077749.0...,http://images.amazon.com/images/P/0404077749.0...,http://images.amazon.com/images/P/0404077749.0...


In [5]:
print(books['Year-Of-Publication'].unique())

['2002' '2001' '1991' '1999' '2000' '1993' '1996' '1988' '2004' '1998'
 '1994' '2003' '1997' '1983' '1979' '1995' '1982' '1985' '1992' '1986'
 '1978' '1980' '1952' '1987' '1990' '1981' '1989' '1984' '0' '1968' '1961'
 '1958' '1974' '1976' '1971' '1977' '1975' '1965' '1941' '1970' '1962'
 '1973' '1972' '1960' '1966' '1920' '1956' '1959' '1953' '1951' '1942'
 '1963' '1964' '1969' '1954' '1950' '1967' '2005' '1957' '1940' '1937'
 '1955' '1946' '1936' '1930' '2011' '1925' '1948' '1943' '1947' '1945'
 '1923' '2020' '1939' '1926' '1938' '2030' '1911' '1904' '1949' '1932'
 '1928' '1929' '1927' '1931' '1914' '2050' '1934' '1910' '1933' '1902'
 '1924' '1921' '1900' '2038' '2026' '1944' '1917' '1901' '2010' '1908'
 '1906' '1935' '1806' '2021' '2012' '2006' 'DK Publishing Inc' 'Gallimard'
 '1909' '2008' '1378' '1919' '1922' '1897' '2024' '1376' '2037']


In [6]:
books['Year-Of-Publication'].isnull().sum()

0

In [7]:
books[pd.to_numeric(books['Year-Of-Publication'], errors='coerce').isnull()]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
209538,078946697X,"DK Readers: Creating the X-Men, How It All Beg...",2000,DK Publishing Inc,http://images.amazon.com/images/P/078946697X.0...,http://images.amazon.com/images/P/078946697X.0...,http://images.amazon.com/images/P/078946697X.0...,
220731,2070426769,"Peuple du ciel, suivi de 'Les Bergers\"";Jean-M...",2003,Gallimard,http://images.amazon.com/images/P/2070426769.0...,http://images.amazon.com/images/P/2070426769.0...,http://images.amazon.com/images/P/2070426769.0...,
221678,0789466953,"DK Readers: Creating the X-Men, How Comic Book...",2000,DK Publishing Inc,http://images.amazon.com/images/P/0789466953.0...,http://images.amazon.com/images/P/0789466953.0...,http://images.amazon.com/images/P/0789466953.0...,


In [8]:
books['Year-Of-Publication'] = pd.to_numeric(books['Year-Of-Publication'], errors='coerce')

In [9]:
print(books['Year-Of-Publication'].isnull().sum())

3


In [10]:
books['Year-Of-Publication'] = books['Year-Of-Publication'].fillna(0)

In [11]:
books[books['Year-Of-Publication'] == 0]['Year-Of-Publication'].count()

4621

In [12]:
print(books['Year-Of-Publication'].isnull().sum())

0


In [13]:
books.sample(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
5840,0553277723,"NIGHT SHE DIED, THE",DOROTHY SIMPSON,1985.0,Crimeline,http://images.amazon.com/images/P/0553277723.0...,http://images.amazon.com/images/P/0553277723.0...,http://images.amazon.com/images/P/0553277723.0...
79870,3492225535,Celibidache,Klaus Umbach,1998.0,Piper Verlag GmbH,http://images.amazon.com/images/P/3492225535.0...,http://images.amazon.com/images/P/3492225535.0...,http://images.amazon.com/images/P/3492225535.0...
108981,0670840084,More Please,Barry Humphries,1992.0,Viking,http://images.amazon.com/images/P/0670840084.0...,http://images.amazon.com/images/P/0670840084.0...,http://images.amazon.com/images/P/0670840084.0...
173545,0553296655,Prayers to Broken Stones,Dan Simmons,1992.0,Bantam,http://images.amazon.com/images/P/0553296655.0...,http://images.amazon.com/images/P/0553296655.0...,http://images.amazon.com/images/P/0553296655.0...
100631,207042121X,Balbala,Abdourahman A. Waberi,2002.0,Gallimard,http://images.amazon.com/images/P/207042121X.0...,http://images.amazon.com/images/P/207042121X.0...,http://images.amazon.com/images/P/207042121X.0...


In [14]:
books['Year-Of-Publication'].dtype

dtype('float64')

In [15]:
books['Year-Of-Publication'] = books['Year-Of-Publication'].dropna().astype(int)

In [16]:
books['Year-Of-Publication'].dtype

dtype('int32')

In [17]:
books.sample(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
162713,0451182766,Double Dead,Gary Hardwick,1998,Onyx Books,http://images.amazon.com/images/P/0451182766.0...,http://images.amazon.com/images/P/0451182766.0...,http://images.amazon.com/images/P/0451182766.0...
150707,0785280707,The Legend of Storey County: A Novel,Brock Thoene,1995,Thomas Nelson Inc,http://images.amazon.com/images/P/0785280707.0...,http://images.amazon.com/images/P/0785280707.0...,http://images.amazon.com/images/P/0785280707.0...
20528,042518885X,The Wicked Flea: A Dog Lover's Mystery (Dog Lo...,Susan Conant,2003,Berkely Prime Crime,http://images.amazon.com/images/P/042518885X.0...,http://images.amazon.com/images/P/042518885X.0...,http://images.amazon.com/images/P/042518885X.0...
54084,0451145747,"Of Quests and Kings (Castaways in Time, No 3)",Robert Adams,1986,New Amer Library,http://images.amazon.com/images/P/0451145747.0...,http://images.amazon.com/images/P/0451145747.0...,http://images.amazon.com/images/P/0451145747.0...
182430,1857994353,A Breath of Fresh Air,Erica James,1996,Phoenix mass market p/bk,http://images.amazon.com/images/P/1857994353.0...,http://images.amazon.com/images/P/1857994353.0...,http://images.amazon.com/images/P/1857994353.0...


In [18]:
books.shape

(271360, 8)

In [19]:
books.columns

Index(['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher',
       'Image-URL-S', 'Image-URL-M', 'Image-URL-L'],
      dtype='object')

In [20]:
books = books[['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher', 'Image-URL-M']]

In [21]:
books.sample(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M
8188,679892206,"Secret Ingredient (Stepping Stone, paper)",G. E. Stanley,1999,Random House Trade,http://images.amazon.com/images/P/0679892206.0...
261669,446670286,The Entertainment Weekly Guide to the Greatest...,Entertainment Weekly,1994,Warner Books,http://images.amazon.com/images/P/0446670286.0...
113090,60006293,Home to Harmony,Philip Gulley,2002,Harper SanFrancisco,http://images.amazon.com/images/P/0060006293.0...
242208,1567181635,Falcon Feather &amp; Valkyrie Sword: Feminine ...,D. J. Conway,1995,Llewellyn Pubns,http://images.amazon.com/images/P/1567181635.0...
53700,843951249,Danelaw,Susan Squires,2003,Dorchester Publishing Company,http://images.amazon.com/images/P/0843951249.0...


In [22]:
books = books.rename(columns={
    'Book-Title': 'title',
    'Book-Author': 'author',
    'Year-Of-Publication': 'year',
    'Publisher': 'publisher',
    'Image-URL-M': 'image-url'
})

In [23]:
books.sample(5)

Unnamed: 0,ISBN,title,author,year,publisher,image-url
249031,8516023508,A Colina Dos Suspiros,Moacyr Scliar,0,Luso Brazilian Books,http://images.amazon.com/images/P/8516023508.0...
9826,673522946,Technical Writing,John M. Lannon,1993,Addison-Wesley Pub Co,http://images.amazon.com/images/P/0673522946.0...
91331,809281090,Winning Bodybuilding,Franco. Columbu,1977,McGraw-Hill,http://images.amazon.com/images/P/0809281090.0...
65988,1551664216,Second Thoughts (Jackie Kaminsky Mysteries),Margot Dalton,1998,Mira,http://images.amazon.com/images/P/1551664216.0...
166948,2266023128,L Enfant Noir,Camara Laye,2001,Distribooks Inc,http://images.amazon.com/images/P/2266023128.0...


In [24]:
users = pd.read_csv('data/Users.csv')
ratings = pd.read_csv('data/Ratings.csv')

In [25]:
users.sample(5)

Unnamed: 0,User-ID,Location,Age
36493,36494,"hawthorne, california, usa",
238206,238207,"ft. irwin, california, usa",
48525,48526,"shelbyville, kentucky, usa",22.0
154689,154690,"edinburgh, scotland, united kingdom",27.0
202799,202800,"centreville, virginia, usa",32.0


In [26]:
ratings.sample(5)

Unnamed: 0,User-ID,ISBN,Book-Rating
398790,95923,446672211,0
783139,189623,349101779,0
111217,25757,3499131706,0
1056281,252222,754804461,8
1071055,255979,3596152356,0


In [27]:
print(f"Shape of books: {books.shape}")
print(f"Shape of users: {users.shape}")
print(f"Shape of ratings: {ratings.shape}")

Shape of books: (271360, 6)
Shape of users: (278858, 3)
Shape of ratings: (1149780, 3)


In [28]:
users = users.rename(columns={
    'User-ID': 'user-id',
    'Location': 'location',
    'Age': 'age'
})
ratings = ratings.rename(columns={
    'User-ID': 'user-id',
    'Book-Rating': 'rating'
})

In [29]:
len(ratings['user-id'].unique())

105283

In [30]:
x = ratings['user-id'].value_counts() > 200
x[x]

user-id
11676     True
198711    True
153662    True
98391     True
35859     True
          ... 
274808    True
28634     True
59727     True
268622    True
188951    True
Name: count, Length: 899, dtype: bool

In [31]:
ratings = ratings[ratings['user-id'].isin(x[x].index)]

In [32]:
ratings.shape

(526356, 3)

In [33]:
len(ratings['user-id'].unique())

899

In [34]:
ratings.sample(5)

Unnamed: 0,user-id,ISBN,rating
577231,138844,449219461,0
1024851,245827,307155498,0
295303,69971,345418611,0
365682,87746,679405283,0
298254,70594,553801945,8


In [35]:
ratings_with_books = ratings.merge(books, on='ISBN')

In [36]:
ratings_with_books.shape

(487671, 8)

In [37]:
ratings_with_books.sample(5)

Unnamed: 0,user-id,ISBN,rating,title,author,year,publisher,image-url
180529,102967,0449219550,0,Longshot,Dick Francis,1994,Fawcett Books,http://images.amazon.com/images/P/0449219550.0...
468725,264321,0345384377,0,Sole Survivor,DEAN KOONTZ,1997,Ballantine Books,http://images.amazon.com/images/P/0345384377.0...
323122,187517,0399148019,8,The Absence of Nectar,Kathy Hepinstall,2001,G. P. Putnam's Sons,http://images.amazon.com/images/P/0399148019.0...
289321,170229,039457060X,0,Reasonable Creatures: Essays on Women and Femi...,Katha Pollitt,1994,Alfred A. Knopf,http://images.amazon.com/images/P/039457060X.0...
141439,81492,0446356956,0,The Fortune,Michael Korda,1990,Warner Books,http://images.amazon.com/images/P/0446356956.0...


In [38]:
num_ratings = ratings_with_books.groupby('title')['rating'].count().reset_index()

In [39]:
num_ratings.sample(5)

Unnamed: 0,title,rating
64938,Keeping Fit (Let Me Read : Level 1),1
94832,Puppy Love (Beethoven's 2nd),4
150369,Unsafe Keeping,1
109391,Star Trek: Phase II : The Making of the Lost S...,2
135532,The Rancher's Hand-Picked Bride,5


In [40]:
num_ratings = num_ratings.rename(columns={
    'rating': 'num-of-ratings'
})

In [41]:
num_ratings.sample(5)

Unnamed: 0,title,num-of-ratings
8762,American Exorcism: Expelling Demons in the Lan...,1
15353,Best of Helpful Hints,1
110016,Still Lickin' the Spoon (And Other Confessions...,1
30820,David Hockney's Dog Days,1
124110,The Final Deduction,4


In [42]:
final_ratings = ratings_with_books.merge(num_ratings, on='title')

In [43]:
final_ratings.sample(5)

Unnamed: 0,user-id,ISBN,rating,title,author,year,publisher,image-url,num-of-ratings
393938,227447,1551668920,0,A Season Of Miracles,Heather Graham,2002,Mira,http://images.amazon.com/images/P/1551668920.0...,15
433020,242824,553441531,0,"A Moment in Time (Loveswept, No 489)",Helen Mittermeyer,1991,Loveswept,http://images.amazon.com/images/P/0553441531.0...,3
53775,29259,345307674,0,Return of the Jedi (Star Wars),James Kahn,1983,Del Rey Books,http://images.amazon.com/images/P/0345307674.0...,21
473399,266226,425193918,0,Mrs. Jeffries Sweeps the Chimney,Emily Brightwell,2004,Berkley Publishing Group,http://images.amazon.com/images/P/0425193918.0...,3
413186,234828,671879472,8,The SECRET SANTA (NANCY DREW NOTEBOOK 3) : THE...,Carolyn Keene,1994,Aladdin,http://images.amazon.com/images/P/0671879472.0...,1


In [44]:
final_ratings.shape

(487671, 9)

In [45]:
final_ratings = final_ratings[final_ratings['num-of-ratings'] >= 50]

In [46]:
final_ratings.sample(5)

Unnamed: 0,user-id,ISBN,rating,title,author,year,publisher,image-url,num-of-ratings
158188,93047,805063897,0,Nickel and Dimed: On (Not) Getting By in America,Barbara Ehrenreich,2002,Owl Books,http://images.amazon.com/images/P/0805063897.0...,112
224254,129358,345313860,0,"The Vampire Lestat (Vampire Chronicles, Book II)",ANNE RICE,1986,Ballantine Books,http://images.amazon.com/images/P/0345313860.0...,123
473145,266226,312983867,0,Hard Eight : A Stephanie Plum Novel (A Stephan...,Janet Evanovich,2003,St. Martin's Paperbacks,http://images.amazon.com/images/P/0312983867.0...,112
273205,158295,671741195,0,The Cradle Will Fall,Mary Higgins Clark,1991,Pocket,http://images.amazon.com/images/P/0671741195.0...,58
461002,258185,449221490,0,L Is for Lawless,Sue Grafton,1996,Fawcett Books,http://images.amazon.com/images/P/0449221490.0...,70


In [47]:
final_ratings.shape

(61853, 9)

In [48]:
final_ratings = final_ratings.drop_duplicates(['title', 'user-id'])

In [49]:
final_ratings.shape

(59850, 9)

In [50]:
book_pivot = final_ratings.pivot_table(columns='user-id', index='title', values='rating')

In [51]:
book_pivot

user-id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,,,,,0.0,,,,
1st to Die: A Novel,,,,,,,,,,,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,,...,,,,0.0,,,,,0.0,
4 Blondes,,,,,,,,,,0.0,...,,,,,,,,,,
84 Charing Cross Road,,,,,,,,,,,...,,,,,,10.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,7.0,,...,,,,,,0.0,,,,
You Belong To Me,,,,,,,,,,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,,,0.0,...,,,,,,0.0,,,,
Zoya,,,,,,,,,,,...,,,,,,,,,,


In [52]:
book_pivot = book_pivot.fillna(0)

In [53]:
book_pivot

user-id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Sparse Matrix Creation
Since the dataset can be sparse (many users haven't rated many books), we convert the pivot table into a sparse matrix format to optimize memory usage.

In [54]:
from scipy.sparse import csr_matrix

In [55]:
book_sparse = csr_matrix(book_pivot)

In [56]:
book_sparse

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 14961 stored elements and shape (742, 888)>

### K-Nearest Neighbors Model Training
We will use the KNN algorithm to train our model. The KNN model identifies similar books based on the ratings users have given.

In [57]:
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(algorithm='brute')

In [58]:
model.fit(book_sparse)

In [59]:
distance, suggestion = model.kneighbors(book_pivot.iloc[237,:].values.reshape(1, -1), n_neighbors=6)

In [60]:
distance

array([[ 0.        , 67.75691847, 68.05145112, 72.277244  , 75.81556568,
        76.30203143]])

In [61]:
suggestion

array([[237, 238, 240, 241, 184, 536]], dtype=int64)

In [62]:
for book in suggestion[0]:
    print(book_pivot.index[book])

Harry Potter and the Chamber of Secrets (Book 2)
Harry Potter and the Goblet of Fire (Book 4)
Harry Potter and the Prisoner of Azkaban (Book 3)
Harry Potter and the Sorcerer's Stone (Book 1)
Exclusive
The Cradle Will Fall


In [63]:
book_pivot.index[237]

'Harry Potter and the Chamber of Secrets (Book 2)'

In [64]:
book_name = book_pivot.index

In [65]:
book_name

Index(['1984', '1st to Die: A Novel', '2nd Chance', '4 Blondes',
       '84 Charing Cross Road', 'A Bend in the Road', 'A Case of Need',
       'A Child Called \It\": One Child's Courage to Survive"',
       'A Civil Action', 'A Cry In The Night',
       ...
       'Winter Solstice', 'Wish You Well', 'Without Remorse',
       'Wizard and Glass (The Dark Tower, Book 4)', 'Wuthering Heights',
       'Year of Wonders', 'You Belong To Me',
       'Zen and the Art of Motorcycle Maintenance: An Inquiry into Values',
       'Zoya', '\O\" Is for Outlaw"'],
      dtype='object', name='title', length=742)

### Saving Model Artifacts
Once the model is trained, we save the trained model and other important data (like the book pivot table) using `pickle`, so they can be used later without retraining the model.

In [66]:
import pickle
pickle.dump(model, open('artifacts/model.pkl', 'wb'))
pickle.dump(book_name, open('artifacts/book_name.pkl', 'wb'))
pickle.dump(final_ratings, open('artifacts/final_ratings.pkl', 'wb'))
pickle.dump(book_pivot, open('artifacts/book_pivot.pkl', 'wb'))

### Book Recommendation Function
Finally, we define a function that takes a book name as input and returns a list of recommended books based on the KNN model.

In [67]:
def recommend_book(book_name, n_neighbors=6):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    distance, suggestion = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1, -1), n_neighbors=n_neighbors+1)
    for book in suggestion[0]:
        print(book_pivot.index[book])
    

recommended_books = recommend_book(book_name='Harry Potter and the Chamber of Secrets (Book 2)', n_neighbors=10)

Harry Potter and the Chamber of Secrets (Book 2)
Harry Potter and the Goblet of Fire (Book 4)
Harry Potter and the Prisoner of Azkaban (Book 3)
Harry Potter and the Sorcerer's Stone (Book 1)
Exclusive
The Cradle Will Fall
Jacob Have I Loved
Tom Clancy's Op-Center (Tom Clancy's Op Center (Paperback))
The Witness
Toxin
Truly, Madly Manhattan
