# Collaborative Recommender

For this notebook, we will create a system where it uses the actions of users to recommend other anime shows. There are two approches\
<b>Item based approach</b> -> Check similarity between two items base on user's rating\
<b>user based approach</b> -> Look at history of user's rating or shows watch and reommend. Not work well due to dynamic of users.

## Install dependencies + Import Data

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
anime_data = pd.read_csv("anime.csv") # read in anime csv file
rating_df = pd.read_csv("animelist.csv") # read in animelist csv file

anime_data = anime_data.rename(columns={"MAL_ID": "anime_id"})
anime_contact_data = anime_data[["anime_id", "Name"]]


In [3]:
rating_data = rating_df.merge(anime_contact_data, left_on = 'anime_id', right_on = 'anime_id', how = 'left')
rating_data = rating_data[["user_id", "Name", "anime_id","rating", "watching_status", "watched_episodes"]]
rating_data.head()

Unnamed: 0,user_id,Name,anime_id,rating,watching_status,watched_episodes
0,0,Basilisk: Kouga Ninpou Chou,67,9,1,1
1,0,Fairy Tail,6702,7,1,4
2,0,Gokusen,242,10,1,4
3,0,Kuroshitsuji,4898,0,1,1
4,0,One Piece,21,10,1,0


In [4]:
rating_df.shape

(109224747, 5)

Take examples in which a particular anime has more than 200 votes and if user has gave in total more than 500 votes to anime

In [5]:
count = rating_data['user_id'].value_counts()
count1 = rating_data['anime_id'].value_counts()
rating_data = rating_data[rating_data['user_id'].isin(count[count >= 500].index)].copy()
rating_data = rating_data[rating_data['anime_id'].isin(count1[count1 >= 200].index)].copy()

In [6]:
# shape of our dataframe should decrease now!
rating_df.shape

(109224747, 5)

In [7]:
combine_movie_rating = rating_data.dropna(axis = 0, subset = ['Name'])
movie_ratingCount = (combine_movie_rating.
     groupby(by = ['Name'])['rating'].
     count().
     reset_index()
     [['Name', 'rating']]
    )
movie_ratingCount.head()

Unnamed: 0,Name,rating
0,"""0""",1203
1,"""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi",5844
2,"""Bungaku Shoujo"" Memoire",7243
3,"""Bungaku Shoujo"" Movie",13348
4,"""Eiji""",667


In [8]:
rating_data = combine_movie_rating.merge(movie_ratingCount, left_on = 'Name', right_on = 'Name', how = 'left')
rating_data = rating_data.drop(columns = "rating_x")
rating_data = rating_data.rename(columns={"rating_y": "rating"})
rating_data


Unnamed: 0,user_id,Name,anime_id,watching_status,watched_episodes,rating
0,6,Angel Beats! Specials,9062,1,1,27206
1,6,Ao no Exorcist,9919,1,2,52766
2,6,Blood+,150,1,15,28193
3,6,Casshern Sins,4981,1,12,18356
4,6,Guilty Crown,10793,1,2,51561
...,...,...,...,...,...,...
59802842,353398,Tales of Zestiria the Cross 2nd Season,34086,6,0,14917
59802843,353398,Uchouten Kazoku,17909,6,0,22725
59802844,353398,Urara Meirochou,32924,6,0,14891
59802845,353398,Yamada-kun to 7-nin no Majo: Mou Hitotsu no Su...,24627,6,0,12141


In [9]:
# Encoding categorical data
user_ids = rating_data["user_id"].unique().tolist()
user2user_encoded = {x: i for i, x in enumerate(user_ids)}
user_encoded2user = {i: x for i, x in enumerate(user_ids)}
rating_data["user"] = rating_data["user_id"].map(user2user_encoded)
n_users = len(user2user_encoded)

anime_ids = rating_data["anime_id"].unique().tolist()
anime2anime_encoded = {x: i for i, x in enumerate(anime_ids)}
anime_encoded2anime = {i: x for i, x in enumerate(anime_ids)}
rating_data["anime"] = rating_data["anime_id"].map(anime2anime_encoded)
n_animes = len(anime2anime_encoded)

print("Num of users: {}, Num of animes: {}".format(n_users, n_animes))
print("Min total rating: {}, Max total rating: {}".format(min(rating_data['rating']), max(rating_data['rating'])))



Num of users: 66698, Num of animes: 11812
Min total rating: 138, Max total rating: 60305


In [13]:
group = rating_data.groupby('user_id')['rating'].count()
top_users = group.dropna().sort_values(ascending=False)[:20]
top_r = rating_data.join(top_users, rsuffix='_r', how='inner', on='user_id')

group = rating_data.groupby('anime_id')['rating'].count()
top_animes = group.dropna().sort_values(ascending=False)[:20]
top_r = top_r.join(top_animes, rsuffix='_r', how='inner', on='anime_id')

pivot = pd.crosstab(top_r.user_id, top_r.anime_id, top_r.rating, aggfunc=np.sum)


In [14]:
pivot.fillna(0, inplace=True)
pivot

anime_id,226,1535,1575,2001,2167,4224,5081,5114,6547,6746,9253,9989,10620,11757,14813,15809,16498,19815,20507,30276
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
11100,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
20807,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
50485,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
63900,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
68042,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
85472,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
92529,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
122341,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
131988,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762
140590,53043,58995,56793,53605,53751,59602,57137,56781,59828,55062,59047,54139,56273,60145,53339,54875,60305,57006,55486,55762


In [15]:
# now we create a pivot table based on Name and User_id column and save it into a variable name pivot_table

piv_table = rating_data.pivot_table(index="Name",columns="user_id", values="rating").fillna(0)
piv_table

user_id,6,12,17,19,21,42,44,47,53,60,...,353352,353353,353357,353365,353370,353383,353385,353390,353395,353398
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"""0""",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5844.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Memoire",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7243.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Movie",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13348.0,13348.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Eiji""",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
xxxHOLiC Movie: Manatsu no Yoru no Yume,0.0,0.0,10103.0,0.0,0.0,0.0,0.0,0.0,10103.0,0.0,...,0.0,0.0,0.0,0.0,10103.0,0.0,10103.0,10103.0,10103.0,0.0
xxxHOLiC Rou,0.0,0.0,10009.0,0.0,0.0,0.0,0.0,0.0,10009.0,0.0,...,0.0,0.0,0.0,0.0,10009.0,0.0,0.0,10009.0,0.0,0.0
xxxHOLiC Shunmuki,0.0,0.0,10859.0,0.0,0.0,0.0,0.0,0.0,10859.0,0.0,...,0.0,0.0,0.0,0.0,10859.0,0.0,0.0,10859.0,0.0,0.0
ēlDLIVE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7507.0,0.0,7507.0


In [26]:
piv_table.to_csv('animeforapp.csv') 

## Model Creation

Implementing KNN
We convert our table to a 2D matrix, and fill the missing values with zeros(since we will calculate distance
betweeb ratubg vectors). We then transform the ratings of the matrix dataframe into a scipy sparse matrix for more efficient calculations

In [16]:
# We will use unsupervised algorithms. The algorithm we use to compute the nearest neighbors
# We use metrics = cosine, so  the algorithm will calculate the cosine similarity between rating vectors.
# Then we fit the model

from scipy.sparse import csr_matrix
piv_table_matrix = csr_matrix(piv_table.values)

In [17]:
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(metric='cosine', algorithm='brute')
model.fit(piv_table_matrix)

NearestNeighbors(algorithm='brute', metric='cosine')

In [18]:
piv_table.shape[0] # gives us the total number of examples

11810

## Give me recommendation!

In [19]:
# create a predict function so it gives me rec

def find_similar_animes(x):
    try:
        for i in range(piv_table.shape[0]): # loop through all examples
            if x in piv_table.index[i]:
                anime = i
                break
    
        query = piv_table.iloc[anime, :].values.reshape(1, -1)
        distance, suggestions = model.kneighbors(query, n_neighbors=6)
    
        for i in range(0, len(distance.flatten())):
            if i == 0:
                print('Recommendations for {0}:\n'.format(piv_table.index[anime]))
            else:
                print('{0}: {1}, with distance of {2}:'.format(i, piv_table.index[suggestions.flatten()[i]], distance.flatten()[i]))
    
    except:
        print('{}!, Not Found in Anime list'.format(x))

## Testing on some anime!

In [20]:
find_similar_animes("Dragon Ball Z")

Recommendations for Dragon Ball Z:

1: Dragon Ball, with distance of 0.1450606692427937:
2: Dragon Ball GT, with distance of 0.1776390581778602:
3: Naruto, with distance of 0.2933269117235471:
4: Death Note, with distance of 0.3084668045802015:
5: Dragon Ball Super, with distance of 0.31112001368618747:


In [21]:
find_similar_animes('Jujutsu Kaisen')

Recommendations for Jujutsu Kaisen:

1: Hataraku Saibou Black, with distance of 0.7539180280320302:
2: The God of Highschool, with distance of 0.754992600191513:
3: Sekikashita Kusanagi, with distance of 0.7624011570874131:
4: Kakushigoto, with distance of 0.7632567948818259:
5: Deca-Dence: Install, with distance of 0.763419163547513:


In [22]:
find_similar_animes('Black Clover')

Recommendations for Black Clover:

1: Boku no Hero Academia 2nd Season, with distance of 0.2719444179080356:
2: Boku no Hero Academia 3rd Season, with distance of 0.27476371654314014:
3: Boku no Hero Academia, with distance of 0.2758424253534635:
4: Tate no Yuusha no Nariagari, with distance of 0.2911902154886916:
5: Nanatsu no Taizai, with distance of 0.29178458905486815:


https://www.kaggle.com/code/everydaycodings/anime-recommendation-engine-collaborative-method/notebook#Test-our-model-and-make-some-recommendations:\
https://www.kaggle.com/datasets/hernan4444/anime-recommendation-database-2020