## Book Recommendation Project

### Books Dataset

Books are identified by their ISBN codes.  
Additionally, content-based information is included, such as **Book-Title**, **Book-Author**, **Year-Of-Publication**, and **Publisher**, which have been retrieved from Amazon Web Services.  
If a book has multiple authors, only the first author appears in the data.

Also included are cover image URLs in three sizes:  
**Image-URL-S**, **Image-URL-M**, **Image-URL-L** (small, medium, large).  
These URLs direct to Amazon's website.

### Ratings Dataset

This dataset contains book rating information.  
Ratings (**Book-Rating**) can be:
- **explicit**, on a scale of 1–10 (higher value = better rating), or  
- **implicit**, indicated by a value of 0 (user has not provided a numerical rating).

## Project Objective

The project's objective is to build a book recommendation system that utilizes the Surprise library to implement a user-specific recommendation model. The system aims to predict what kinds of books an individual user is likely to appreciate, based on previous ratings and the behavior of other users.

## Project Components

### 1. Data Preprocessing and Quality Checking

- Merging book and rating data
- Removing invalid ISBN codes
- Handling implicit entries (0-ratings)
- Possible filtering of infrequent users and books

### 2. Building a Recommendation Model with the Surprise Library

- Training the model on user–book ratings
- Experimenting with different algorithms (e.g., **SVD**, **KNNWithMeans**, **BaselineOnly**)
- Evaluating model performance with cross-validation (MAE, RMSE)

### 3. Generating Predictions and Recommendations

- Using an anti-test set to predict ratings for books the user has not yet read
- Creating user-specific **Top-N recommendations**

### 4. Analysis and Interpretation of Results

- Examining the model's accuracy and its limitations
- Considering the impact of data structure on model performance
- Presenting possibilities for further development (e.g., content-based enrichment, hybrid models)

The project's end result is a functional prototype-level book recommendation system that can predict user preferences and provide them with personalized book suggestions based on user data.

In [26]:
import pandas as pd
import numpy as np
from collections import defaultdict
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise import SVD, Dataset, Reader, accuracy, KNNBasic
from surprise.model_selection import train_test_split
from surprise.model_selection import GridSearchCV
import random

In [3]:
df_books = pd.read_csv("Books.csv")
df_books

  df_books = pd.read_csv("Books.csv")


Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,0060973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,0374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,0393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...
...,...,...,...,...,...,...,...,...
271355,0440400988,There's a Bat in Bunk Five,Paula Danziger,1988,Random House Childrens Pub (Mm),http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...
271356,0525447644,From One to One Hundred,Teri Sloat,1991,Dutton Books,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...
271357,006008667X,Lily Dale : The True Story of the Town that Ta...,Christine Wicker,2004,HarperSanFrancisco,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...
271358,0192126040,Republic (World's Classics),Plato,1996,Oxford University Press,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...


In [4]:
df_books = df_books[["ISBN", "Book-Title"]]
df_books

Unnamed: 0,ISBN,Book-Title
0,0195153448,Classical Mythology
1,0002005018,Clara Callan
2,0060973129,Decision in Normandy
3,0374157065,Flu: The Story of the Great Influenza Pandemic...
4,0393045218,The Mummies of Urumchi
...,...,...
271355,0440400988,There's a Bat in Bunk Five
271356,0525447644,From One to One Hundred
271357,006008667X,Lily Dale : The True Story of the Town that Ta...
271358,0192126040,Republic (World's Classics)


In [5]:
df_ratings = pd.read_csv("Ratings.csv")
df_ratings

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6
...,...,...,...
1149775,276704,1563526298,9
1149776,276706,0679447156,0
1149777,276709,0515107662,10
1149778,276721,0590442449,10


In [8]:
df_merged = pd.merge(df_ratings, df_books, on="ISBN", how="inner")
df_merged

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title
0,276725,034545104X,0,Flesh Tones: A Novel
1,276726,0155061224,5,Rites of Passage
2,276727,0446520802,0,The Notebook
3,276729,052165615X,3,Help!: Level 1
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...
...,...,...,...,...
1031131,276704,0876044011,0,Edgar Cayce on the Akashic Records: The Book o...
1031132,276704,1563526298,9,Get Clark Smart : The Ultimate Guide for the S...
1031133,276706,0679447156,0,Eight Weeks to Optimum Health: A Proven Progra...
1031134,276709,0515107662,10,The Sherbrooke Bride (Bride Trilogy (Paperback))


In [9]:
"""
Check the lengths of the ISBN column and possible illogical values
Assume that ISBN should be 10 or 13 characters long
ISBN codes have two standard lengths:
- ISBN-10: 10 characters (in use before 2007)
- ISBN-13: 13 characters (current standard, in use since 2007)
"""
invalid_isbn = df_merged[~df_merged['ISBN'].str.replace('-', '').str.isdigit() | 
                         ~df_merged['ISBN'].str.replace('-', '').str.len().isin([10, 13])]
print("Illogical ISBN values:")
print(invalid_isbn)

Illogical ISBN values:
         User-ID        ISBN  Book-Rating  \
0         276725  034545104X            0   
3         276729  052165615X            3   
6         276744  038550120X            7   
10        276746  055356451X            0   
25        276762  034544003X            0   
...          ...         ...          ...   
1031064   276688  055308920X            0   
1031089   276688  068484267X            0   
1031093   276688  068810553X            0   
1031126   276704  059032120X            0   
1031129   276704  080410526X            0   

                                                Book-Title  
0                                     Flesh Tones: A Novel  
3                                           Help!: Level 1  
6                                          A Painted House  
10                                              Night Sins  
25       Southampton Row (Charlotte &amp; Thomas Pitt N...  
...                                                    ...  
1031064  

The original DataFrame has 1,031,136 rows × 4 columns, and there are 85,392 rows with illogical ISBN values.
In other words, approximately 8% of the data contains invalid ISBN values. In this project, the decision is made to clean the invalid ISBN values.

In [10]:
# Remove zero ratings
df_merged = df_merged[df_merged['Book-Rating'] != 0]

# Check for empty values in the Book-Rating column
print("Empty values in Book-Rating column:", df_merged['Book-Rating'].isnull().sum())

# Check for illogical values (e.g., below 0 or above 10)
invalid_ratings = df_merged[(df_merged['Book-Rating'] < 0) | (df_merged['Book-Rating'] > 10)]
print("Illogical Book-Rating values:")
print(invalid_ratings)
df_clean = df_merged[~df_merged.index.isin(invalid_isbn.index)]
print("Number of rows after cleaning:", len(df_clean))

Empty values in Book-Rating column: 0
Illogical Book-Rating values:
Empty DataFrame
Columns: [User-ID, ISBN, Book-Rating, Book-Title]
Index: []
Number of rows after cleaning: 352196


In [11]:
df_clean

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title
1,276726,0155061224,5,Rites of Passage
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...
13,276747,0060517794,9,Little Altars Everywhere
16,276747,0671537458,9,Waiting to Exhale
17,276747,0679776818,8,Birdsong: A Novel of Love and War
...,...,...,...,...
1031128,276704,0743211383,7,Dreamcatcher
1031130,276704,0806917695,5,Perplexing Lateral Thinking Puzzles: Scholasti...
1031132,276704,1563526298,9,Get Clark Smart : The Ultimate Guide for the S...
1031134,276709,0515107662,10,The Sherbrooke Bride (Bride Trilogy (Paperback))


In [14]:
print("Create ISBN -> Book Title dictionary")
# This is needed later for displaying recommendations
isbn_to_title = df_clean[['ISBN', 'Book-Title']].drop_duplicates().set_index('ISBN')['Book-Title'].to_dict()
print(f"Saved {len(isbn_to_title)} book titles")
print("\nExamples:")
for isbn, title in list(isbn_to_title.items())[:3]:
    print(f"  {isbn} -> {title}")

Create ISBN -> Book Title dictionary
Saved 137710 book titles

Examples:
  0155061224 -> Rites of Passage
  0521795028 -> The Amsterdam Connection : Level 4 (Cambridge English Readers)
  0060517794 -> Little Altars Everywhere


## Data Preparation

In [15]:
df_surprise = df_clean[['User-ID', 'ISBN', 'Book-Rating']]
print("Selected columns for Surprise:")
print("User: User-ID")
print("Item: ISBN")
print("Rating: Book-Rating")
print()

# Check rating scale
min_rating = df_clean['Book-Rating'].min()
max_rating = df_clean['Book-Rating'].max()
print(f"Rating scale: {min_rating} - {max_rating}")
print()

# Create Surprise Reader and Dataset
reader = Reader(rating_scale=(min_rating, max_rating))
data = Dataset.load_from_df(df_surprise, reader)
print("Surprise Dataset created!")

Selected columns for Surprise:
User: User-ID
Item: ISBN
Rating: Book-Rating

Rating scale: 1 - 10

Surprise Dataset created!


## Splitting data into training and test sets

In [16]:
trainset, testset = train_test_split(data, test_size=0.25, random_state=42)

print(f"Trainset: {trainset.n_ratings} ratings")
print(f"Testset: {len(testset)} ratings")

Trainset: 264147 ratings
Testset: 88049 ratings


In [17]:
# raw_ratings contains (user, item, rating, timestamp)
raw_ratings = data.raw_ratings

# Take only the first 3 elements (user, item, rating)
ratings_3cols = [(uid, iid, r) for (uid, iid, r, t) in raw_ratings]

# Take a small sample for speed
sample_size = int(len(ratings_3cols) * 0.5)
sampled_ratings = ratings_3cols[:sample_size]

# Create Dataset object
sampled_data = Dataset.load_from_df(
    pd.DataFrame(sampled_ratings, columns=['userID', 'itemID', 'rating']),
    reader=data.reader
)

# Hyperparameters with limited range
param_grid_svd = {
    'n_factors': [50, 100],
    'n_epochs': [20, 30],
    'lr_all': [0.002, 0.005],
    'reg_all': [0.02, 0.05]
}

# GridSearchCV optimized for speed
gs_svd = GridSearchCV(
    SVD,
    param_grid_svd,
    measures=['rmse', 'mae'],
    cv=3,
    n_jobs=-1
)

# Run optimization
gs_svd.fit(sampled_data)
print("Best RMSE:", gs_svd.best_score['rmse'])
print("Best parameter selection:", gs_svd.best_params['rmse'])

Best RMSE: 1.6452713031866093
Best parameter selection: {'n_factors': 50, 'n_epochs': 30, 'lr_all': 0.005, 'reg_all': 0.05}


## Train a recommendation model

### The Problem: KNN Model - Memory Issue and Solution

On the first training attempt, we encountered a MemoryError:
```
MemoryError: Unable to allocate 20.8 GiB for an array with shape (74666, 74666) and data type int32
```

After cleaning, there are approximately 945,744 rows and 74,666 unique ISBNs in the data, which means that KNN tries to create a 74,666 × 74,666 similarity matrix, which would require over 20 GB of RAM. On a normal computer, this simply does not fit in memory.

### The Solution

To address this memory constraint, we implement filtering to reduce the dataset size by:
- Keeping only active users (those with at least 5 ratings)
- Keeping only popular books (those with at least 10 ratings)

This approach significantly reduces the matrix dimensions while retaining the most informative data for the recommendation system.

In [18]:
# KNN model

# Filter active users and popular books
min_ratings_user = 5
min_ratings_book = 10  

active_users = df_clean['User-ID'].value_counts()
active_users = active_users[active_users >= min_ratings_user].index
df_filtered = df_clean[df_clean['User-ID'].isin(active_users)]

popular_books = df_filtered['ISBN'].value_counts()
popular_books = popular_books[popular_books >= min_ratings_book].index
df_filtered = df_filtered[df_filtered['ISBN'].isin(popular_books)]

print("Number of rows after filtering:", len(df_filtered))
print("Unique users:", df_filtered['User-ID'].nunique())
print("Unique books:", df_filtered['ISBN'].nunique())

# Load as Surprise dataset
reader = Reader(rating_scale=(0, 10))
data = Dataset.load_from_df(df_filtered[['User-ID', 'ISBN', 'Book-Rating']], reader)

# Quick sample for GridSearchCV (~50% of data)
sample_size = int(len(df_filtered) * 0.5)
df_sampled = df_filtered.iloc[:sample_size]
data_sampled = Dataset.load_from_df(df_sampled[['User-ID', 'ISBN', 'Book-Rating']], reader)

# Hyperparameters for KNN
param_grid_knn = {
    'k': [30, 40, 50],
    'sim_options': {
        'name': ['cosine', 'pearson'],
        'user_based': [False]  # item-based
    }
}

# GridSearchCV with smaller sample
gs_knn = GridSearchCV(KNNBasic, param_grid_knn, measures=['rmse', 'mae'], cv=3, n_jobs=-1)
gs_knn.fit(data_sampled)

print("Best RMSE (CV):", gs_knn.best_score['rmse'])
print("Best parameter selection (CV):", gs_knn.best_params['rmse'])

# Final KNN model on full data with best parameters from CV
trainset = data.build_full_trainset()
best_params_knn = gs_knn.best_params['rmse']
algo_knn = KNNBasic(k=best_params_knn['k'], sim_options=best_params_knn['sim_options'])
algo_knn.fit(trainset)

print("KNN model trained on final trainset")

Number of rows after filtering: 83096
Unique users: 10459
Unique books: 3587
Best RMSE (CV): 1.7766223705855098
Best parameter selection (CV): {'k': 40, 'sim_options': {'name': 'pearson', 'user_based': False}}
Computing the pearson similarity matrix...
Done computing similarity matrix.
KNN model trained on final trainset


In [19]:
# SVD model
# Load dataset
reader = Reader(rating_scale=(0, 10))
data = Dataset.load_from_df(df_filtered[['User-ID', 'ISBN', 'Book-Rating']], reader)

# Split into train/test sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

# Train SVD model
algo_svd = SVD()
algo_svd.fit(trainset)

# Predict on test set
predictions = algo_svd.test(testset)

In [20]:
for i in range(10):
    print(predictions[i])

user: 139913     item: 0449221482 r_ui = 10.00   est = 8.19   {'was_impossible': False}
user: 30035      item: 0440214041 r_ui = 7.00   est = 7.11   {'was_impossible': False}
user: 134347     item: 0451169514 r_ui = 9.00   est = 7.98   {'was_impossible': False}
user: 196160     item: 0451176464 r_ui = 4.00   est = 7.85   {'was_impossible': False}
user: 189678     item: 0440220424 r_ui = 9.00   est = 8.23   {'was_impossible': False}
user: 30081      item: 0449907481 r_ui = 10.00   est = 7.98   {'was_impossible': False}
user: 244400     item: 0671042262 r_ui = 10.00   est = 8.00   {'was_impossible': False}
user: 52256      item: 0804106304 r_ui = 8.00   est = 8.07   {'was_impossible': False}
user: 21364      item: 0394758285 r_ui = 10.00   est = 7.31   {'was_impossible': False}
user: 34801      item: 0061020656 r_ui = 6.00   est = 7.20   {'was_impossible': False}


## Test the accuracy of models

In [21]:
# Test KNN
predictions_knn = algo_knn.test(testset)
rmse_knn = accuracy.rmse(predictions_knn, verbose=False)
mae_knn = accuracy.mae(predictions_knn, verbose=False)
print(f"KNN model:")
print(f"  RMSE: {rmse_knn:.3f}")
print(f"  MAE: {mae_knn:.3f}")
print()

# Test SVD
predictions_svd = algo_svd.test(testset)
rmse_svd = accuracy.rmse(predictions_svd, verbose=False)
mae_svd = accuracy.mae(predictions_svd, verbose=False)
print(f"SVD model:")
print(f"  RMSE: {rmse_svd:.3f}")
print(f"  MAE: {mae_svd:.3f}")

KNN model:
  RMSE: 0.838
  MAE: 0.542

SVD model:
  RMSE: 1.551
  MAE: 1.197


## Analysis of SVD and KNN Model Results (Updated After Zero-Rating Removal and KNN Optimization)

### Initial Model Performance with SVD
- **RMSE:** 3.586  
- **MAE:** 2.864  

The initial error metrics were quite high, which is typical for heterogeneous and partially noisy book rating datasets.

---

### Updated Results After Zero-Rating Removal (SVD)
- **RMSE:** 1.585  
- **MAE:** 1.218  

**Why it improved:**  
- Zero ratings acted as noise in the data.  
- After removal, the model's predictions were based on more realistic user–book ratings, which significantly reduced the error.  

---

### KNN Model Results (Zero Ratings Removed and Hyperparameters Optimized)
- **RMSE:** 0.852  
- **MAE:** 0.557  

**Why KNN improved so much:**  
1. **Pearson similarity**: accounts for users' different rating levels and removes systematic bias ratings.  
2. **Optimization k=30**: more neighbors are included in the prediction → smoother and more accurate values.  
3. **Item-based KNN** works well with dense data, such as here (~23 ratings per book).  
4. CV hyperparameter selection enabled the model settings to best match the dataset structure.

→ Results improved dramatically: RMSE decreased by over 40% and MAE by over 50% compared to SVD results.

---

### Interpretation of Results
- **Zero-rating removal** significantly improved the quality of both models.  
- **KNN model now clearly the most accurate**, as the item-based method can leverage the dense data structure and similar books are easily found.  
- **SVD model** still provides good predictions for generalization and works better with sparser datasets or top-N recommendations.

---

### Conclusions
- **SVD before:** RMSE/MAE ~3–3.6 → rough, mainly for ranking perspective.  
- **SVD now:** RMSE/MAE ~1.2–1.6 → usable for rating predictions.  
- **KNN now:** RMSE 0.852, MAE 0.557 → clearly the best accuracy based on this dataset.  

---

### Improvement Opportunities
- SVD: Optimize the number of latent factors (**n_factors**).  
- SVD: Fine-tune hyperparameters (**n_epochs**, **lr_all**, **reg_all**).  
- KNN: Try different `k` values, similarity methods, and weighting (item-based vs. user-based).  
- Try alternative models:  
  - **SVD++**  
  - **BaselineOnly**  
  - **KNNBaseline**  

---

### What Explains the Significant Impact of Hyperparameter Optimization for KNN?
1. **Pearson similarity** accounts for users' average rating levels → removes systematic bias.  
2. **Right k**: k=30 is appropriate so that only relevant neighbor books are weighted, but not too many misleading neighbor books.  
3. **Using the full dataset in the final model**: the entire dataset was utilized, so the prediction is more accurate than the CV estimate.  

---

### Latent Factors and n_factors (SVD)
- Latent factors describe hidden features such as genre, difficulty level, style, and popularity for books, as well as preferences for users.  
- **n_factors** determines how many such dimensions the model learns.  
- Too small → underfitting, too large → overfitting.  
- In practice, optimized based on RMSE/MAE (e.g., 20, 50, 100).  

---

### Summary
- **KNNBasic (item-based, Pearson, k=30)** → best performance: RMSE 0.852, MAE 0.557.  
- **SVD** → useful for generalization, RMSE 1.584, MAE 1.216.  
- Removed zero ratings and optimized hyperparameter selection explain the significant improvement in KNN results.

## Create recommendations

In [28]:
print("Generate recommendations for users")
print("-" * 80)

# Random users (only 5 users)
sample_users = random.sample(trainset.all_users(), min(5, trainset.n_users))

# Create lightweight anti-testset
anti_testset_sample = []
for uid_inner in sample_users:
    uid = trainset.to_raw_uid(uid_inner)
    
    # IDs of books previously rated by the user
    user_rated_iids = {trainset.to_raw_iid(iid) for (iid, _) in trainset.ur[uid_inner]}
    
    # All books
    all_iids = set(trainset.to_raw_iid(iid) for iid in trainset.all_items())
    
    # Books the user has not rated
    unrated_iids = list(all_iids - user_rated_iids)
    
    # Limit to 5 books per user
    unrated_sample = random.sample(unrated_iids, min(5, len(unrated_iids)))
    
    for iid in unrated_sample:
        anti_testset_sample.append((uid, iid, 0))

# Predict only for these user–book pairs
predictions_sample = algo_svd.test(anti_testset_sample)

# Function to get top-N recommendations
def get_top_n(predictions, n=5):
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]
    return top_n

top_n = get_top_n(predictions_sample, n=5)

# Display recommendations for a few users
print("Top-5 book recommendations (using ISBN -> Title mapping!):\n")
for user_id in top_n:
    print(f"User: {user_id}")
    
    # Previous ratings
    user_books = df_clean[df_clean['User-ID'] == user_id][['Book-Title', 'Book-Rating']].values
    if len(user_books) > 0:
        print(f"  Previously read:")
        for book, rating in user_books[:3]:
            print(f"    • {book} (rating: {rating}/10)")
    
    print(f"  Recommendations:")
    for isbn, predicted_rating in top_n[user_id]:
        book_title = isbn_to_title.get(isbn, "Unknown book")
        print(f"    • {book_title}")
        print(f"      ISBN: {isbn}, Predicted rating: {predicted_rating:.1f}/10")
    print()

Generate recommendations for users
--------------------------------------------------------------------------------
Top-5 book recommendations (using ISBN -> Title mapping!):

User: 77724
  Previously read:
    • Travels With Charley: In Search of America (rating: 6/10)
    • The Amazing Adventures of Kavalier &amp; Clay (rating: 9/10)
    • Digital Fortress : A Thriller (rating: 8/10)
  Recommendations:
    • The Golden Mean: In Which the Extraordinary Correspondence of Griffin &amp; Sabine Concludes
      ISBN: 0811802981, Predicted rating: 8.5/10
    • Clear and Present Danger
      ISBN: 0399134409, Predicted rating: 7.8/10
    • Name of the Rose-Nla
      ISBN: 0446322180, Predicted rating: 7.8/10
    • Slightly Settled (Red Dress Ink)
      ISBN: 0373250479, Predicted rating: 7.7/10
    • Cold Fire
      ISBN: 0425130711, Predicted rating: 7.0/10

User: 264464
  Previously read:
    • Australia in the Seventies (rating: 3/10)
    • The Lucky Country: Australia Today (rating: 7/10

## Individual forecast

In [29]:
print("Predict for a single user-book pair")
print("-" * 80)

# Select a random user and book
test_user = df_clean['User-ID'].iloc[0]
test_isbn = df_clean['ISBN'].iloc[5]
test_title = isbn_to_title[test_isbn]

prediction = algo_svd.predict(test_user, test_isbn)

print(f"User: {test_user}")
print(f"Book: {test_title}")
print(f"ISBN: {test_isbn}")
print(f"Predicted rating: {prediction.est:.1f}/10")

Predict for a single user-book pair
--------------------------------------------------------------------------------
User: 276726
Book: How to Deal With Difficult People
ISBN: 0943066433
Predicted rating: 7.8/10
