# KNN for anime recommendations

Existing research papers: <p>
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4121831<p>
https://ijcrt.org/papers/IJCRT2201084.pdf

K-nearest neighbours algorithm can be used for both content and collaborative filtering. Content-based approach
is simpler due to the difficulty of scale when it comes to the user-based approach. KNN calculates the distance
between items in the dataset without making any assumptions on the distribution of data. KNN is a simple model as it does not require a training phase. It doesn't have a cold start problem, i.e. when there isn't much data to work with in the beginning. However, it doesn't perform well for very large datasets and is reliant on the choice of number of neighbours.

In [54]:
#reproduction of existing solution: https://gist.github.com/Tahsin-Mayeesha/81dcdafc61b774768b64ba5201e31e0a
#article: https://medium.com/learning-machine-learning/recommending-animes-using-nearest-neighbors-61320a1a5934

#using 2023 dataset from https://www.kaggle.com/datasets/dbdmobile/myanimelist-dataset

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

anime = pd.read_csv("../data/anime-dataset-2023.csv")
anime.head()

Unnamed: 0,anime_id,Name,English name,Other name,Score,Genres,Synopsis,Type,Episodes,Aired,...,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL
0,1,Cowboy Bebop,Cowboy Bebop,カウボーイビバップ,8.75,"Action, Award Winning, Sci-Fi","Crime is timeless. By the year 2071, humanity ...",TV,26.0,"Apr 3, 1998 to Apr 24, 1999",...,Sunrise,Original,24 min per ep,R - 17+ (violence & profanity),41.0,43,78525,914193.0,1771505,https://cdn.myanimelist.net/images/anime/4/196...
1,5,Cowboy Bebop: Tengoku no Tobira,Cowboy Bebop: The Movie,カウボーイビバップ 天国の扉,8.38,"Action, Sci-Fi","Another day, another bounty—such is the life o...",Movie,1.0,"Sep 1, 2001",...,Bones,Original,1 hr 55 min,R - 17+ (violence & profanity),189.0,602,1448,206248.0,360978,https://cdn.myanimelist.net/images/anime/1439/...
2,6,Trigun,Trigun,トライガン,8.22,"Action, Adventure, Sci-Fi","Vash the Stampede is the man with a $$60,000,0...",TV,26.0,"Apr 1, 1998 to Sep 30, 1998",...,Madhouse,Manga,24 min per ep,PG-13 - Teens 13 or older,328.0,246,15035,356739.0,727252,https://cdn.myanimelist.net/images/anime/7/203...
3,7,Witch Hunter Robin,Witch Hunter Robin,Witch Hunter ROBIN (ウイッチハンターロビン),7.25,"Action, Drama, Mystery, Supernatural",Robin Sena is a powerful craft user drafted in...,TV,26.0,"Jul 3, 2002 to Dec 25, 2002",...,Sunrise,Original,25 min per ep,PG-13 - Teens 13 or older,2764.0,1795,613,42829.0,111931,https://cdn.myanimelist.net/images/anime/10/19...
4,8,Bouken Ou Beet,Beet the Vandel Buster,冒険王ビィト,6.94,"Adventure, Fantasy, Supernatural",It is the dark century and the people are suff...,TV,52.0,"Sep 30, 2004 to Sep 29, 2005",...,Toei Animation,Manga,23 min per ep,PG - Children,4240.0,5126,14,6413.0,15001,https://cdn.myanimelist.net/images/anime/7/215...


In [3]:
anime[anime['Episodes']=='UNKNOWN'].head(3)

Unnamed: 0,anime_id,Name,English name,Other name,Score,Genres,Synopsis,Type,Episodes,Aired,...,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL
11,21,One Piece,One Piece,ONE PIECE,8.69,"Action, Adventure, Fantasy","Gol D. Roger was known as the ""Pirate King,"" t...",TV,UNKNOWN,"Oct 20, 1999 to ?",...,Toei Animation,Manga,24 min,PG-13 - Teens 13 or older,55.0,20,198986,1226493.0,2168904,https://cdn.myanimelist.net/images/anime/6/732...
211,235,Detective Conan,Case Closed,名探偵コナン,8.17,"Adventure, Comedy, Mystery","Shinichi Kudou, a high school student of astou...",TV,UNKNOWN,"Jan 8, 1996 to ?",...,TMS Entertainment,Manga,25 min,PG-13 - Teens 13 or older,382.0,653,13964,154061.0,334559,https://cdn.myanimelist.net/images/anime/7/751...
871,966,Crayon Shin-chan,Shin Chan,クレヨンしんちゃん,7.77,"Comedy, Ecchi",There is no such thing as an uneventful day in...,TV,UNKNOWN,"Apr 13, 1992 to ?",...,Shin-Ei Animation,Manga,21 min,G - All Ages,972.0,2228,1178,39023.0,79676,https://cdn.myanimelist.net/images/anime/10/59...


In [7]:
#filling in unknown episodes with 1 for the following genres
anime.loc[(anime["Genres"]=="Hentai") & (anime["Episodes"]=="UNKNOWN"),"Episodes"] = 1
anime.loc[(anime["Type"]=="OVA") & (anime["Episodes"]=="UNKNOWN"),"Episodes"] = 1
anime.loc[(anime["Type"] == "Movie") & (anime["Episodes"] == "UNKNOWN")] = 1

In [19]:
#known anime still running have unknown number of episodes, but we can fill them in

known_animes = {"Naruto Shippuuden":500, "One Piece":784,"Detective Conan":854, "Dragon Ball Super":86,
                "Crayon Shin chan":942, "Yu Gi Oh Arc V":148,"Shingeki no Kyojin Season 2":25,
                "Boku no Hero Academia 2nd Season":25,"Little Witch Academia TV":25}

for k,v in known_animes.items():    
    anime.loc[anime["Name"]==k,"Episodes"] = v

In [20]:
anime["Episodes"] = anime["Episodes"].map(lambda x:np.nan if x=="UNKNOWN" else x)

In [21]:
anime["Episodes"].fillna(anime["Episodes"].median(),inplace = True)

In [22]:
pd.get_dummies(anime[["Type"]]).head()

Unnamed: 0,Type_1,Type_Movie,Type_Music,Type_ONA,Type_OVA,Type_Special,Type_TV,Type_UNKNOWN
0,False,False,False,False,False,False,True,False
1,False,True,False,False,False,False,False,False
2,False,False,False,False,False,False,True,False
3,False,False,False,False,False,False,True,False
4,False,False,False,False,False,False,True,False


In [23]:
anime["Score"] = anime["Score"].map(lambda x:np.nan if x=="UNKNOWN" else x)
anime["Members"] = anime["Members"].map(lambda x:np.nan if x=="UNKNOWN" else x)

anime["Score"] = anime["Score"].astype(float)
anime["Score"].fillna(anime["Score"].median(),inplace = True)
anime["Members"] = anime["Members"].astype(float)

In [29]:
#scaling
anime_features = pd.concat([anime["Genres"].str.get_dummies(sep=","),
                            pd.get_dummies(anime[["Type"]]),
                            anime[["Score"]],anime[["Members"]],anime["Episodes"]],axis=1)
anime["Name"] = anime["Name"].map(lambda name:re.sub('[^A-Za-z0-9]+', " ", name))
anime_features.head()

Unnamed: 0,Adventure,Avant Garde,Award Winning,Boys Love,Comedy,Drama,Ecchi,Erotica,Fantasy,Girls Love,...,Type_Movie,Type_Music,Type_ONA,Type_OVA,Type_Special,Type_TV,Type_UNKNOWN,Score,Members,Episodes
0,0,0,1,0,0,0,0,0,0,0,...,False,False,False,False,False,True,False,8.75,1771505.0,26.0
1,0,0,0,0,0,0,0,0,0,0,...,True,False,False,False,False,False,False,8.38,360978.0,1.0
2,1,0,0,0,0,0,0,0,0,0,...,False,False,False,False,False,True,False,8.22,727252.0,26.0
3,0,0,0,0,0,1,0,0,0,0,...,False,False,False,False,False,True,False,7.25,111931.0,26.0
4,0,0,0,0,0,0,0,0,1,0,...,False,False,False,False,False,True,False,6.94,15001.0,52.0


In [30]:
#scaling values to be between 0-1 as we have huge number of episodes
#fill unknown scores with median of scores
from sklearn.preprocessing import MinMaxScaler

min_max_scaler = MinMaxScaler()
anime_features = min_max_scaler.fit_transform(anime_features)
np.round(anime_features,2)

array([[0.  , 0.  , 1.  , ..., 0.96, 0.47, 0.01],
       [0.  , 0.  , 0.  , ..., 0.91, 0.1 , 0.  ],
       [1.  , 0.  , 0.  , ..., 0.89, 0.19, 0.01],
       ...,
       [1.  , 0.  , 0.  , ..., 0.66, 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.66, 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.66, 0.  , 0.  ]])

In [46]:
# fit nearest neighbours

from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors(n_neighbors=6, algorithm='ball_tree').fit(anime_features)
distances, recommended_indices = nbrs.kneighbors(anime_features)

In [37]:
#helper functions
def get_index_from_name(name):
    return anime[anime["Name"]==name].index.tolist()[0]

In [39]:
all_anime_names = list(anime.Name.values)

In [47]:
def get_id_from_partial_name(partial):
    for name in all_anime_names:
        if partial in name:
            print(name,all_anime_names.index(name))
""" print_similar_query can search for similar animes both by id and by name. """

def print_similar_animes(query=None,id=None):
    if id:
        for id in recommended_indices[id][1:]:
            print(anime.loc[id]["Name"])
    if query:
        found_id = get_index_from_name(query)
        for id in recommended_indices[found_id][1:]:
            print(anime.loc[id]["Name"])

In [48]:
#query samples
print_similar_animes(query="Naruto")

Hunter x Hunter 2011 
Naruto Shippuuden
Nanatsu no Taizai
Bleach
One Piece


In [52]:
print_similar_animes(query="InuYasha")

Akatsuki no Yona
Hitsugi no Chaika
InuYasha Kanketsu hen
Tsubasa Chronicle
Seirei Gensouki


In [53]:
get_id_from_partial_name("InuYasha")

InuYasha 225
InuYasha Movie 4 Guren no Houraijima 421
InuYasha Movie 2 Kagami no Naka no Mugenjo 422
InuYasha Movie 3 Tenka Hadou no Ken 423
InuYasha Movie 1 Toki wo Koeru Omoi 424
InuYasha Kuroi Tessaiga 3860
InuYasha Kanketsu hen 4745
Shounen Sunday CM InuYasha hen 10784
