Now we will build our collaborative model using user data.

First we will import our cleaned anime and user data.

In [1]:
import pandas as pd

In [3]:
anime_filtered_df = pd.read_csv("data/anime_filtered.csv")

In [4]:
anime_filtered_df.head()

Unnamed: 0,anime_id,name,score,rank,genres,synopsis,type,episodes,popularity,members,studios,source,favorites,rating,year
0,1,cowboy bebop,8.75,41.0,"action, award winning, sci-fi","crime is timeless. by the year 2071, humanity ...",tv,26.0,43,1771505,sunrise,original,78525,rated 17,1998
1,5,cowboy bebop: tengoku no tobira,8.38,189.0,"action, sci-fi","another day, another bounty—such is the life o...",movie,1.0,602,360978,bones,original,1448,rated 17,2001
2,6,trigun,8.22,328.0,"action, adventure, sci-fi","vash the stampede is the man with a $$60,000,0...",tv,26.0,246,727252,madhouse,manga,15035,parental guidance 13,1998
3,7,witch hunter robin,7.25,2764.0,"action, drama, mystery, supernatural",robin sena is a powerful craft user drafted in...,tv,26.0,1795,111931,sunrise,original,613,parental guidance 13,2002
4,8,bouken ou beet,6.94,4240.0,"adventure, fantasy, supernatural",it is the dark century and the people are suff...,tv,52.0,5126,15001,toei animation,manga,14,parental guidance,2004


In [5]:
user_clean = pd.read_csv("data/user_clean.csv")

In [6]:
user_clean.head()

Unnamed: 0,user_id,anime_id,rating
0,48,9062,7
1,48,6746,8
2,48,6702,6
3,48,9314,7
4,48,9367,5


We will use SVD (Singular Value Decomposition) which is efficient for decomposing large user-item matrices into latent factors. Users and animes are represented in a shared latent space, enabling similarity computation. We will start with importing necessary libraries.

In [9]:
from surprise import SVD, Dataset, Reader
from surprise.model_selection import train_test_split
from surprise import accuracy
import numpy as np

Prepare the Dataset

In [24]:
# Define a Reader object with the rating scale
reader = Reader(rating_scale=(1, 10)) 

In [25]:
# Load data into Surprise's Dataset
data = Dataset.load_from_df(user_clean[['user_id', 'anime_id', 'rating']], reader)

 Split Data into Training and Test Sets

In [26]:
# Split into training and test sets (80% train, 20% test)
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

Train the SVD Model

_(We already ran our gridsearch cv for hyperparamter tuning in our previous runs so we can simply put those optimal paramters here)_

In [None]:
# Initialize the SVD algorithm
svd = SVD(n_factors=200,    # Number of latent factors from gridsearch
          n_epochs=20,      # Number of epochs from gridsearch
          lr_all=0.005,     # Learning rate from gridsearch
          reg_all=0.05)     # Regularization from gridsearch

In [28]:
# Train the model on the training set
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x24c000801d0>

Evaluate the Model using RMSE

In [29]:
# Predict on the test set
predictions = svd.test(testset)

In [30]:
# Evaluate RMSE
rmse = accuracy.rmse(predictions)
print(f"Test RMSE: {rmse}")

RMSE: 1.1114
Test RMSE: 1.111355181776524


We ran a couple of test runs before to also figure out the best dataset combination for user + anime.

Run 1 with 50 rated, 50 rating:
- Test RMSE: 1.1829577065272816

Run 2 with 100 rated, 200 rating:
- Test RMSE: 1.1606663648799898

Run 3 with 200 rated, 500 rating:
- Test RMSE: 1.1269778555038497 

Run 4 with 400 rated, 1000 rating:
- Test RMSE: 1.1334690466014203

Run 5 with 300 rate, 800 rating:
- Test RMSE: 1.111355181776524

Lets do hyperparameter tuning with Gridsearch CV (already done before)

In [36]:
from surprise.model_selection import GridSearchCV
from surprise import SVD, accuracy

In [None]:
# Define parameter grid
param_grid = {
    'n_factors': [50, 100, 200, 300],
    'n_epochs': [10, 20, 30],
    'lr_all': [0.003, 0.005, 0.01],
    'reg_all': [0.02, 0.05, 0.1]
}

In [51]:
# Perform grid search
grid_search = GridSearchCV(SVD, param_grid, measures=['rmse'], cv=3, n_jobs=-1)
grid_search.fit(data)

In [52]:
# Output the best score and parameters
print("Best RMSE:", grid_search.best_score['rmse'])
print("Best Parameters:", grid_search.best_params['rmse'])

Best RMSE: 1.1245594745629168
Best Parameters: {'n_factors': 200, 'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.05}


400 users and 1000+ rating:
- Best RMSE: 1.1219733459903392
- Best Parameters: {'n_factors': 50, 'n_epochs': 20, 'lr_all': 0.01, 'reg_all': 0.05}

300 users and 800+ rating: 
- Best RMSE: 1.1302356374242466
- Best Parameters: {'n_factors': 100, 'n_epochs': 20, 'lr_all': 0.01, 'reg_all': 0.05}
-
- Best RMSE: 1.124383666758084
- Best Parameters: {'n_factors': 200, 'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.05}
- Test RMSE: 1.111182922356423
-
- Best RMSE: 1.1230012845978004
- Best Parameters: {'n_factors': 300, 'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.05}
- Test RMSE: 1.1092369091404355

In [53]:
# Train SVD with the best parameters
best_svd = grid_search.best_estimator['rmse']
best_svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x2a21f6b2de0>

In [55]:
# Test predictions
predictions = best_svd.test(testset)

In [56]:
# Evaluate RMSE
rmse = accuracy.rmse(predictions)
print(f"Final Test RMSE: {rmse}")

RMSE: 1.1112
Final Test RMSE: 1.111182922356423


With our tuned model now we can do some anime recommendations.

New user's ratings for Cowboy Bebop, Trigun and Monster:

In [None]:
new_user_ratings = [
    (0, 1, 8),  # User ID 0, Anime ID 1, Rating 8 
    (0, 6, 7),  # User ID 0, Anime ID 6, Rating 9
    (0, 19, 10), # User ID 0, Anime ID 19, Rating 10
]

Make Predictions:

In [32]:
# List of all anime IDs
all_anime_ids = user_clean['anime_id'].unique()

In [33]:
# Anime already rated by the new user
rated_anime_ids = [rating[1] for rating in new_user_ratings]

In [34]:
# Generate predictions for all other anime
recommendations = []
for anime_id in all_anime_ids:
    if anime_id not in rated_anime_ids:
        # Predict for the new user (user_id = 0) and unseen anime
        pred = svd.predict(uid=0, iid=anime_id)
        recommendations.append((anime_id, pred.est))

In [35]:
# Sort recommendations by predicted rating
recommendations = sorted(recommendations, key=lambda x: x[1], reverse=True)

In [36]:
# Top 10 recommendations
top_10 = recommendations[:10]
print("Top 10 Recommendations:")
for anime_id, rating in top_10:
    print(f"Anime ID: {anime_id}, Predicted Rating: {rating}")

Top 10 Recommendations:
Anime ID: 5114, Predicted Rating: 8.817364631936927
Anime ID: 9253, Predicted Rating: 8.678385913965682
Anime ID: 32281, Predicted Rating: 8.665134990366706
Anime ID: 44, Predicted Rating: 8.66042625471332
Anime ID: 28977, Predicted Rating: 8.647542828389533
Anime ID: 15335, Predicted Rating: 8.647527995831881
Anime ID: 4181, Predicted Rating: 8.603174197309315
Anime ID: 9969, Predicted Rating: 8.597680425914291
Anime ID: 15417, Predicted Rating: 8.5694255235959
Anime ID: 820, Predicted Rating: 8.516752173439695


Now we can add anime name and genres from anime_filtered dataset to this result.

In [37]:
# anime_filtered_df has columns 'anime_id', 'name', and 'genres'
top_10_df = pd.DataFrame(top_10, columns=['anime_id', 'predicted_rating'])
top_10_detailed = pd.merge(top_10_df, anime_filtered_df, on='anime_id')

Since we ran a few test runs we can easily see the improvement here in Run 5:

In [38]:
# Display top 10 with details
top_10_detailed[['anime_id', 'name', 'genres', 'predicted_rating']]

Unnamed: 0,anime_id,name,genres,predicted_rating
0,5114,fullmetal alchemist: brotherhood,"action, adventure, drama, fantasy",8.817365
1,9253,steins;gate,"drama, sci-fi, suspense",8.678386
2,32281,kimi no na wa.,"award winning, drama, supernatural",8.665135
3,44,rurouni kenshin: meiji kenkaku romantan - tsui...,"action, drama, romance",8.660426
4,28977,gintama°,"action, comedy, sci-fi",8.647543
5,15335,gintama movie 2: kanketsu-hen - yorozuya yo ei...,"action, comedy, sci-fi",8.647528
6,4181,clannad: after story,"drama, romance, supernatural",8.603174
7,9969,gintama',"action, comedy, sci-fi",8.59768
8,15417,gintama': enchousen,"action, comedy, sci-fi",8.569426
9,820,ginga eiyuu densetsu,"drama, sci-fi",8.516752


Run 1:
| Name                                      | Genres                                | Predicted Rating |
|-------------------------------------------|---------------------------------------|------------------|
| Kimi no Na wa.                            | Award Winning, Drama, Supernatural    | 9.049190         |
| Fullmetal Alchemist: Brotherhood          | Action, Adventure, Drama, Fantasy     | 9.016155         |
| Steins;Gate                               | Drama, Sci-Fi, Suspense               | 8.922295         |
| Rurouni Kenshin: Meiji Kenkaku Romantan - Tsui... | Action, Drama, Romance               | 8.852004         |
| Sen to Chihiro no Kamikakushi             | Adventure, Award Winning, Supernatural| 8.798092         |
| The First Slam Dunk                       | Award Winning, Sports                 | 8.781020         |
| Clannad: After Story                      | Drama, Romance, Supernatural          | 8.766202         |
| Doupo Cangqiong: San Nian Zhi Yue         | Action, Fantasy                       | 8.762913         |
| Ginga Eiyuu Densetsu                      | Drama, Sci-Fi                         | 8.753458         |
| Gintama Movie 2: Kanketsu-hen - Yorozuya yo Ei... | Action, Comedy, Sci-Fi               | 8.746942         |


Run 2:
| Name                                      | Genres                                | Predicted Rating |
|-------------------------------------------|---------------------------------------|------------------|
| Fullmetal Alchemist: Brotherhood          | Action, Adventure, Drama, Fantasy     | 9.000304         |
| Kimi no Na wa.                            | Award Winning, Drama, Supernatural    | 8.991726         |
| Steins;Gate                               | Drama, Sci-Fi, Suspense               | 8.839333         |
| Gintama°                                  | Action, Comedy, Sci-Fi                | 8.766442         |
| Sen to Chihiro no Kamikakushi             | Adventure, Award Winning, Supernatural| 8.723244         |
| Ookami Kodomo no Ame to Yuki              | Award Winning, Fantasy, Slice of Life | 8.712725         |
| Gintama'                                  | Action, Comedy, Sci-Fi                | 8.706339         |
| Gintama Movie 2: Kanketsu-hen - Yorozuya yo Ei... | Action, Comedy, Sci-Fi               | 8.704523         |
| Hunter x Hunter (2011)                    | Action, Adventure, Fantasy            | 8.701267         |
| Hajime no Ippo                            | Sports                                | 8.692973         |


Run 3:
| Name                                      | Genres                                | Predicted Rating |
|-------------------------------------------|---------------------------------------|------------------|
| Fullmetal Alchemist: Brotherhood          | Action, Adventure, Drama, Fantasy     | 8.975653         |
| Kimi no Na wa.                            | Award Winning, Drama, Supernatural    | 8.844541         |
| Steins;Gate                               | Drama, Sci-Fi, Suspense               | 8.829259         |
| Gintama'                                  | Action, Comedy, Sci-Fi                | 8.761310         |
| Hunter x Hunter (2011)                    | Action, Adventure, Fantasy            | 8.740554         |
| Ginga Eiyuu Densetsu                      | Drama, Sci-Fi                         | 8.728148         |
| Gintama°                                  | Action, Comedy, Sci-Fi                | 8.704312         |
| Gintama Movie 2: Kanketsu-hen - Yorozuya yo Ei... | Action, Comedy, Sci-Fi               | 8.672002         |
| Rurouni Kenshin: Meiji Kenkaku Romantan - Tsui... | Action, Drama, Romance               | 8.657989         |
| Koe no Katachi                            | Award Winning, Drama                  | 8.650224         |


Run 4: 
| Name                                      | Genres                                | Predicted Rating |
|-------------------------------------------|---------------------------------------|------------------|
| Fullmetal Alchemist: Brotherhood          | Action, Adventure, Drama, Fantasy     | 8.817365         |
| Steins;Gate                               | Drama, Sci-Fi, Suspense               | 8.678386         |
| Kimi no Na wa.                            | Award Winning, Drama, Supernatural    | 8.665135         |
| Rurouni Kenshin: Meiji Kenkaku Romantan - Tsui... | Action, Drama, Romance               | 8.660426         |
| Gintama°                                  | Action, Comedy, Sci-Fi                | 8.647543         |
| Gintama Movie 2: Kanketsu-hen - Yorozuya yo Ei... | Action, Comedy, Sci-Fi               | 8.647528         |
| Clannad: After Story                      | Drama, Romance, Supernatural          | 8.603174         |
| Gintama'                                  | Action, Comedy, Sci-Fi                | 8.597680         |
| Gintama': Enchousen                       | Action, Comedy, Sci-Fi                | 8.569426         |
| Ginga Eiyuu Densetsu                      | Drama, Sci-Fi                         | 8.516752         |


Saving files

In [39]:
from surprise import dump

In [40]:
# Save the SVD model
dump.dump("data/svd_model_3", algo=svd)

In [41]:
import joblib
# Save the trained SVD model
joblib.dump(svd, "data/svd_model_3.pkl")

['data/svd_model_3.pkl']