# Building a collaborative filtering system with anime
The ur-example for collaborative filtering is the MovieLens Database, but let's be honest: movie suggestions are nice, but I don't like watching similar movies. For me, media needs to be acclaimed but not to pop, and have a certain degree of cult following. Thus, I'm going to build here a collaborative filtering data set with anime the website My Anime List. I found a [dataset](https://www.kaggle.com/datasets/marlesson/myanimelist-dataset-animes-profiles-reviews) on Kaggle with reviews. I downloaded the files and then uploaded them into my personal Google Drive.

In [None]:
from fastai.tabular.all import *
from fastai.collab import *
import pandas as pd

In [None]:
path = Path('/content/drive/MyDrive/Machine Learning/anime')

In [None]:
Path.BASE_PATH = path

In [None]:
path.ls()

(#3) [Path('animes.csv'),Path('profiles.csv'),Path('reviews.csv')]

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Goal: Determine the latent factors based on the ratings from reviews.csv and create an application which when given a particular show's name, return the relevant thing

# Reading in data from the csv file
Examining the file structure, there are two relevant files. reviews.csv contains the reviews, while animes.csv contains the id of each anime and the title. For easy data processing, I want to merge these to files into one dataframe.

In [None]:
raw_df = pd.read_csv(path/'reviews.csv',usecols=['profile','anime_uid','score'])

In [None]:
raw_df.head()

Unnamed: 0,profile,anime_uid,score
0,DesolatePsyche,34096,8
1,baekbeans,34599,10
2,skrn,28891,7
3,edgewalker00,2904,9
4,aManOfCulture99,4181,10


In [None]:
titles_df = pd.read_csv(path/'animes.csv',usecols=['uid','title','synopsis'])

In [None]:
titles_df.head()

Unnamed: 0,uid,title,synopsis
0,28891,Haikyuu!! Second Season,"Following their participation at the Inter-High, the Karasuno High School volleyball team attempts to refocus their efforts, aiming to conquer the Spring tournament instead. \r\n \r\nWhen they receive an invitation from long-standing rival Nekoma High, Karasuno agrees to take part in a large training camp alongside many notable volleyball teams in Tokyo and even some national level players. By playing with some of the toughest teams in Japan, they hope not only to sharpen their skills, but also come up with new attacks that would strengthen them. Moreover, Hinata and Kageyama attempt to d..."
1,23273,Shigatsu wa Kimi no Uso,"Music accompanies the path of the human metronome, the prodigious pianist Kousei Arima. But after the passing of his mother, Saki Arima, Kousei falls into a downward spiral, rendering him unable to hear the sound of his own piano. \r\n \r\nTwo years later, Kousei still avoids the piano, leaving behind his admirers and rivals, and lives a colorless life alongside his friends Tsubaki Sawabe and Ryouta Watari. However, everything changes when he meets a beautiful violinist, Kaori Miyazono, who stirs up his world and sets him on a journey to face music again. \r\n \r\nBased on the manga serie..."
2,34599,Made in Abyss,"The Abyss—a gaping chasm stretching down into the depths of the earth, filled with mysterious creatures and relics from a time long past. How did it come to be? What lies at the bottom? Countless brave individuals, known as Divers, have sought to solve these mysteries of the Abyss, fearlessly descending into its darkest realms. The best and bravest of the Divers, the White Whistles, are hailed as legends by those who remain on the surface. \r\n \r\nRiko, daughter of the missing White Whistle Lyza the Annihilator, aspires to become like her mother and explore the furthest reaches of the Aby..."
3,5114,Fullmetal Alchemist: Brotherhood,"""In order for something to be obtained, something of equal value must be lost."" \r\n \r\nAlchemy is bound by this Law of Equivalent Exchange—something the young brothers Edward and Alphonse Elric only realize after attempting human transmutation: the one forbidden act of alchemy. They pay a terrible price for their transgression—Edward loses his left leg, Alphonse his physical body. It is only by the desperate sacrifice of Edward's right arm that he is able to affix Alphonse's soul to a suit of armor. Devastated and alone, it is the hope that they would both eventually return to their orig..."
4,31758,Kizumonogatari III: Reiketsu-hen,"After helping revive the legendary vampire Kiss-shot Acerola-orion Heart-under-blade, Koyomi Araragi has become a vampire himself and her servant. Kiss-shot is certain she can turn him back into a human, but only once regaining her full power. \r\n \r\nAraragi has hunted down the three vampire hunters that defeated Kiss-shot and retrieved her limbs to return her to full strength. However, now that Araragi has almost accomplished what he’s been fighting for this whole time, he has to consider if this is what he really wants. Once he revives this powerful immortal vampire, there is no telli..."


In [None]:
titles_df = titles_df.rename(columns={"uid":"anime_uid"})

In [None]:
titles_df.head()

Unnamed: 0,anime_uid,title,synopsis
0,28891,Haikyuu!! Second Season,"Following their participation at the Inter-High, the Karasuno High School volleyball team attempts to refocus their efforts, aiming to conquer the Spring tournament instead. \r\n \r\nWhen they receive an invitation from long-standing rival Nekoma High, Karasuno agrees to take part in a large training camp alongside many notable volleyball teams in Tokyo and even some national level players. By playing with some of the toughest teams in Japan, they hope not only to sharpen their skills, but also come up with new attacks that would strengthen them. Moreover, Hinata and Kageyama attempt to d..."
1,23273,Shigatsu wa Kimi no Uso,"Music accompanies the path of the human metronome, the prodigious pianist Kousei Arima. But after the passing of his mother, Saki Arima, Kousei falls into a downward spiral, rendering him unable to hear the sound of his own piano. \r\n \r\nTwo years later, Kousei still avoids the piano, leaving behind his admirers and rivals, and lives a colorless life alongside his friends Tsubaki Sawabe and Ryouta Watari. However, everything changes when he meets a beautiful violinist, Kaori Miyazono, who stirs up his world and sets him on a journey to face music again. \r\n \r\nBased on the manga serie..."
2,34599,Made in Abyss,"The Abyss—a gaping chasm stretching down into the depths of the earth, filled with mysterious creatures and relics from a time long past. How did it come to be? What lies at the bottom? Countless brave individuals, known as Divers, have sought to solve these mysteries of the Abyss, fearlessly descending into its darkest realms. The best and bravest of the Divers, the White Whistles, are hailed as legends by those who remain on the surface. \r\n \r\nRiko, daughter of the missing White Whistle Lyza the Annihilator, aspires to become like her mother and explore the furthest reaches of the Aby..."
3,5114,Fullmetal Alchemist: Brotherhood,"""In order for something to be obtained, something of equal value must be lost."" \r\n \r\nAlchemy is bound by this Law of Equivalent Exchange—something the young brothers Edward and Alphonse Elric only realize after attempting human transmutation: the one forbidden act of alchemy. They pay a terrible price for their transgression—Edward loses his left leg, Alphonse his physical body. It is only by the desperate sacrifice of Edward's right arm that he is able to affix Alphonse's soul to a suit of armor. Devastated and alone, it is the hope that they would both eventually return to their orig..."
4,31758,Kizumonogatari III: Reiketsu-hen,"After helping revive the legendary vampire Kiss-shot Acerola-orion Heart-under-blade, Koyomi Araragi has become a vampire himself and her servant. Kiss-shot is certain she can turn him back into a human, but only once regaining her full power. \r\n \r\nAraragi has hunted down the three vampire hunters that defeated Kiss-shot and retrieved her limbs to return her to full strength. However, now that Araragi has almost accomplished what he’s been fighting for this whole time, he has to consider if this is what he really wants. Once he revives this powerful immortal vampire, there is no telli..."


In [None]:
ratings = raw_df.merge(titles_df,on='anime_uid')
ratings = ratings.astype({'score':'float64'})

In [None]:
ratings

Unnamed: 0,profile,anime_uid,score,title,synopsis
0,DesolatePsyche,34096,8.0,Gintama.,"After joining the resistance against the bakufu, Gintoki and the gang are in hiding, along with Katsura and his Joui rebels. The Yorozuya is soon approached by Nobume Imai and two members of the Kiheitai, who explain that the Harusame pirates have turned against 7th Division Captain Kamui and their former ally Takasugi. The Kiheitai present Gintoki with a job: find Takasugi, who has been missing since his ship was ambushed in a Harusame raid. Nobume also makes a stunning revelation regarding the Tendoushuu, a secret organization pulling the strings of numerous factions, and their leader Ut..."
1,DesolatePsyche,34096,8.0,Gintama.,"After joining the resistance against the bakufu, Gintoki and the gang are in hiding, along with Katsura and his Joui rebels. The Yorozuya is soon approached by Nobume Imai and two members of the Kiheitai, who explain that the Harusame pirates have turned against 7th Division Captain Kamui and their former ally Takasugi. The Kiheitai present Gintoki with a job: find Takasugi, who has been missing since his ship was ambushed in a Harusame raid. Nobume also makes a stunning revelation regarding the Tendoushuu, a secret organization pulling the strings of numerous factions, and their leader Ut..."
2,claudinou,34096,8.0,Gintama.,"After joining the resistance against the bakufu, Gintoki and the gang are in hiding, along with Katsura and his Joui rebels. The Yorozuya is soon approached by Nobume Imai and two members of the Kiheitai, who explain that the Harusame pirates have turned against 7th Division Captain Kamui and their former ally Takasugi. The Kiheitai present Gintoki with a job: find Takasugi, who has been missing since his ship was ambushed in a Harusame raid. Nobume also makes a stunning revelation regarding the Tendoushuu, a secret organization pulling the strings of numerous factions, and their leader Ut..."
3,claudinou,34096,8.0,Gintama.,"After joining the resistance against the bakufu, Gintoki and the gang are in hiding, along with Katsura and his Joui rebels. The Yorozuya is soon approached by Nobume Imai and two members of the Kiheitai, who explain that the Harusame pirates have turned against 7th Division Captain Kamui and their former ally Takasugi. The Kiheitai present Gintoki with a job: find Takasugi, who has been missing since his ship was ambushed in a Harusame raid. Nobume also makes a stunning revelation regarding the Tendoushuu, a secret organization pulling the strings of numerous factions, and their leader Ut..."
4,PeterFromRussia,34096,8.0,Gintama.,"After joining the resistance against the bakufu, Gintoki and the gang are in hiding, along with Katsura and his Joui rebels. The Yorozuya is soon approached by Nobume Imai and two members of the Kiheitai, who explain that the Harusame pirates have turned against 7th Division Captain Kamui and their former ally Takasugi. The Kiheitai present Gintoki with a job: find Takasugi, who has been missing since his ship was ambushed in a Harusame raid. Nobume also makes a stunning revelation regarding the Tendoushuu, a secret organization pulling the strings of numerous factions, and their leader Ut..."
...,...,...,...,...,...
317474,Kuromizue,9751,9.0,Strike Witches Movie,"After fending off the threat of a Neuroi invasion of Romagna and destroying the enemy's nest over Venezia, Yoshika Miyafuji goes back to her home town in the Empire of Fusou. Despite the loss of her magical and healing abilities, the former officer of the 501st Joint Fighter Wing wants to continue studying medicine. This is in order to help those in need, both civilians and those on the front lines alike. She receives an invitation from a prestigious school in Europe and decides to accept the offer, embarking on a journey back to the war-torn continent. \r\n \r\nHowever, a new danger arise..."
317475,ryanxwonbin,9751,8.0,Strike Witches Movie,"After fending off the threat of a Neuroi invasion of Romagna and destroying the enemy's nest over Venezia, Yoshika Miyafuji goes back to her home town in the Empire of Fusou. Despite the loss of her magical and healing abilities, the former officer of the 501st Joint Fighter Wing wants to continue studying medicine. This is in order to help those in need, both civilians and those on the front lines alike. She receives an invitation from a prestigious school in Europe and decides to accept the offer, embarking on a journey back to the war-torn continent. \r\n \r\nHowever, a new danger arise..."
317476,AobaSuzukaze,9751,10.0,Strike Witches Movie,"After fending off the threat of a Neuroi invasion of Romagna and destroying the enemy's nest over Venezia, Yoshika Miyafuji goes back to her home town in the Empire of Fusou. Despite the loss of her magical and healing abilities, the former officer of the 501st Joint Fighter Wing wants to continue studying medicine. This is in order to help those in need, both civilians and those on the front lines alike. She receives an invitation from a prestigious school in Europe and decides to accept the offer, embarking on a journey back to the war-torn continent. \r\n \r\nHowever, a new danger arise..."
317477,7jaws7,9751,9.0,Strike Witches Movie,"After fending off the threat of a Neuroi invasion of Romagna and destroying the enemy's nest over Venezia, Yoshika Miyafuji goes back to her home town in the Empire of Fusou. Despite the loss of her magical and healing abilities, the former officer of the 501st Joint Fighter Wing wants to continue studying medicine. This is in order to help those in need, both civilians and those on the front lines alike. She receives an invitation from a prestigious school in Europe and decides to accept the offer, embarking on a journey back to the war-torn continent. \r\n \r\nHowever, a new danger arise..."


In [None]:
titles_df.to_pickle("descriptions.pkl")

Now the data is loaded into a pandas dataframe, let's put it into a fast.ai dataloader using CollabDataLoaders. Here is some intuition about what the back end is probably like. Implementing a collaborative learner would use indices as opposed to names. Thus, what CollabDataLoaders would do is first map each profile and movie name to a specific matrix index using a hashmap. If you were to actually build the model, what the Dataloader would do is supply the indices so you don't have to handle any of the backend yourself.

In [None]:
dls = CollabDataLoaders.from_df(ratings,user_name='profile',item_name='title',rating_name='score')

In [None]:
dls.train.show_batch()

Unnamed: 0,profile,title,score
0,13lueH0ur,"Gate: Jieitai Kanochi nite, Kaku Tatakaeri 2nd Season",9.0
1,ggultra2764,Kuuchuu Buranko,9.0
2,Lightniing,Ano Hi Mita Hana no Namae wo Bokutachi wa Mada Shiranai.,10.0
3,Lompa,Nana,10.0
4,NerdCitadel,Fairy Tail Movie 2: Dragon Cry,5.0
5,stormchasar,Onegai☆Twins,6.0
6,lalohuicochea,Bakemono no Ko,10.0
7,dublincore,Durarara!!,7.0
8,KevyVuong,My Imouto: Koakuma na A-Cup,10.0
9,Phoenix_Trite,K-On!,10.0


In [None]:
# dls.classes # don't run this unless you want to see all the shady usernames in full glory

The output of dls.classes does suggest that the format of storage is a dictionary to list of names. This makes indexing easy!

In [None]:
n_users = len(dls.classes['profile'])
n_anime = len(dls.classes['title'])
n_users,n_anime

(47886, 8114)

In [None]:
X,y = dls.one_batch()

Note that X is a digit, not the names passed to the CollabDataLoaders. It has done the index mapping for us!

In [None]:
X

tensor([[25073,   105],
        [26737,  2781],
        [19655,  5818],
        [12568,    79],
        [16889,  1526],
        [38261,  7235],
        [15611,  3800],
        [34193,  6765],
        [ 4717,   218],
        [20516,   990],
        [33964,  7570],
        [15306,  3974],
        [ 6408,  4177],
        [11791,  1673],
        [37321,  2034],
        [25073,  2693],
        [15324,  5134],
        [30188,   771],
        [28578,  5322],
        [16271,  5586],
        [20880,  5134],
        [33199,  1175],
        [23917,  5530],
        [18933,  2763],
        [ 6563,  2279],
        [  754,  2419],
        [ 3140,  5013],
        [35901,  5322],
        [14619,  3530],
        [32115,  1752],
        [15320,  4778],
        [31713,   680],
        [40208,  1295],
        [ 6295,  5855],
        [  249,  7789],
        [32241,  1526],
        [32015,  5069],
        [14811,  3959],
        [26991,  7685],
        [27373,  4159],
        [39508,  4973],
        [30316, 

In [None]:
y

tensor([[ 7.],
        [10.],
        [10.],
        [ 8.],
        [ 8.],
        [ 8.],
        [10.],
        [10.],
        [10.],
        [10.],
        [ 8.],
        [ 8.],
        [ 9.],
        [ 9.],
        [ 5.],
        [ 8.],
        [ 9.],
        [ 8.],
        [ 7.],
        [ 7.],
        [ 8.],
        [ 9.],
        [10.],
        [ 8.],
        [10.],
        [ 7.],
        [ 8.],
        [ 9.],
        [10.],
        [ 8.],
        [ 6.],
        [ 9.],
        [10.],
        [ 7.],
        [ 9.],
        [ 8.],
        [ 3.],
        [10.],
        [ 8.],
        [10.],
        [ 6.],
        [10.],
        [ 8.],
        [ 9.],
        [10.],
        [ 7.],
        [ 7.],
        [ 6.],
        [ 2.],
        [ 8.],
        [ 5.],
        [ 7.],
        [ 9.],
        [ 9.],
        [ 7.],
        [ 4.],
        [ 3.],
        [ 8.],
        [10.],
        [ 6.],
        [ 9.],
        [10.],
        [ 8.],
        [ 6.]])

X,y from dls.one_batch() returns the list of indices and corresponding values to index into during training. Now, I'm going to build the model which is used for training.

This is the basic class setup. The tricky thing is the differentiation: i.e you only want to differentiate the gradients for the indices that you actually take each batch. Luckily, PyTorch's autodifferentiation feature  is complex enough to acccount for that! You just have to make your matrices parameters for the model. To create a pytorch model, you need to extend the pytorch class nn.Module. Don't forget to call init! fastai has one a module class which does not require calling init, but I want to use nn
.Module to be in keeping with pytorch. Other functions such as optimizers then can use this module object and perform optimization on the model.
```
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
```

In [None]:
# Module??

In [None]:
def initialize_params(size):
    return nn.Parameter(torch.randn(size)*0.1) # parameter class adds it to Module parameters, automatically turns on requires grad

In [None]:
class CollabFilter(nn.Module):
    def __init__(self,num_users,num_items,num_latent,y_range):
        super().__init__()
        self.user_factors = initialize_params((num_users,num_latent))
        self.item_factors = initialize_params((num_items,num_latent))
        self.y_range = y_range
        # self.user_bias = initialize_params((num_users))
        # self.item_bias = initialize_params((num_items))
    def forward(self,X):
        users = self.user_factors[X[:,0]]
        items = self.item_factors[X[:,1]]
        # ubias = self.user_bias[X]
        # itbias = self.item_bias[X]
        z = (users*items).sum(dim=1,keepdim=True).sigmoid() # take product, take sigmoid to scale values
        low, high = self.y_range
        z = z*(high-low)+low# get values between 0 and 10 (scale of ratings)
        return z

Yayyyy, the filtering model is instantiated! Let's train it using the Learner object from fast.ai. Because collaborative filtering is, in essence, a regression problem where you try and predict the scores a particular user would give based on their coefficients(user_factors) for each of the anime attributes(item_factors), which is not known ahead of time but ratings are known, you can fit it using gradient descent. Regression models use mean squared error loss, so the loss function I'll be using is MSE loss.

In [None]:
model = CollabFilter(n_users,n_anime,50,(0,10.5))

In [None]:
learn = Learner(dls, model, loss_func=nn.MSELoss(reduction='mean'))
learn.fit_one_cycle(12)

epoch,train_loss,valid_loss,time
0,10.880637,10.934642,00:25
1,9.395119,9.569321,00:24
2,4.338412,5.006995,00:24
3,1.96145,2.748111,00:24
4,1.235819,2.141769,00:25
5,0.899718,1.937379,00:24
6,0.662696,1.853929,00:24
7,0.511575,1.81594,00:24
8,0.418494,1.798269,00:24
9,0.376762,1.790368,00:23


In [None]:
type(dls.classes['title'])

fastai.data.transforms.CategoryMap

Get the names of the valid titles(use ctrl-F to find the name of the one you want)

In [None]:
name = "Beserk!"
idx = dls.classes['title'].o2i[name]

Use cosine similarity to find the top 5 most similar anime

In [None]:
with torch.no_grad():
  m1 = model.item_factors[idx].unsqueeze(dim=0)
  sim = nn.CosineSimilarity(dim=1)
  res = sim(m1,model.item_factors)
  # most smilar
  _,mostSim = torch.topk(res,6)
  print(f"Most similar anime to {name}")
  for k in range(1,len(mostSim)):
    print(k,dls.classes['title'][mostSim[k]])

Most similar anime to Beserk!
1 Hit wo Nerae!
2 Aoki Honoo
3 Munto: Toki no Kabe wo Koete
4 Detective Conan Movie 12: Full Score of Fear
5 Fushigi no Umi no Nadia


In [None]:
torch.save(dls,'anime_titles.pkl')

In [None]:
learn.export()