# MyAnimelist Anime Dataset
as 2019/02/04

For this project we'll analyze **Anime Recommendations** dataset from [Kaggle](https://www.kaggle.com/CooperUnion/anime-recommendations-database). The data contains the following fields:

### Anime.csv

```
anime_id - myanimelist.net's unique id identifying an anime.
name - full name of anime.
genre - comma separated list of genres for this anime.
type - movie, TV, OVA, etc.
episodes - how many episodes in this show. (1 if movie).
rating - average rating out of 10 for this anime.
members - number of community members that are in this anime's "group".
```

### Rating.csv

```
user_id - non identifiable randomly generated user id.
anime_id - the anime that this user has rated.
rating - rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).
```

#### Context

This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.



Let's start with some data analysis imports.

In [1]:
import numpy as np
import pandas as pd

Let's take a quick look at the data.

In [2]:
anime = pd.read_csv("anime.csv")
anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [3]:
rating = pd.read_csv("rating.csv")
rating.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


We can see two columns with same name, but different meaning in both tables. Let's rename.
Then merge the dataframes.

In [4]:
rating.rename(columns={"rating":"user_rating"}, inplace=True)
merge_rating = pd.merge(anime,rating,on='anime_id')

#Then delete all -1 from 'user_rating' column
merge_rating = merge_rating[(merge_rating['user_rating']>=0)]
merge_rating.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,user_id,user_rating
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,99,5
1,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,152,10
2,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,244,10
3,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,271,10
5,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,322,10


Let's take a look at the number of unique users and animes.

In [5]:
n_users = merge_rating.user_id.nunique()
n_items = merge_rating.anime_id.nunique()

print('Num. of Users: '+ str(n_users))
print('Num of Movies: '+str(n_items))

Num. of Users: 69600
Num of Movies: 9926


Create a new date frame with the needed columns.

In [6]:
user_ratings = merge_rating[['user_id', 'name', 'user_rating']]
user_ratings.head()

Unnamed: 0,user_id,name,user_rating
0,99,Kimi no Na wa.,5
1,152,Kimi no Na wa.,10
2,244,Kimi no Na wa.,10
3,271,Kimi no Na wa.,10
5,322,Kimi no Na wa.,10


For more efficient calculation and less memory footprint, we need to transform the values of the dataframe into a scipy sparse matrix.

In [7]:
from scipy.sparse import csr_matrix

#pivot ratings

piv_ratings = user_ratings.pivot_table(index=['name'], 
                                       columns=['user_id'], 
                                       values='user_rating'
                                      ).fillna(0)

#delete zeros
piv_ratings = piv_ratings.loc[:, (piv_ratings != 0).any(axis=0)]

#convert dataframe to scipy sparse matrix
scipy_piv_ratings = csr_matrix(piv_ratings.values)

piv_ratings.head()

user_id,1,2,3,5,7,8,9,10,11,12,...,73507,73508,73509,73510,73511,73512,73513,73514,73515,73516
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
&quot;0&quot;,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"&quot;Aesop&quot; no Ohanashi yori: Ushi to Kaeru, Yokubatta Inu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
&quot;Bungaku Shoujo&quot; Kyou no Oyatsu: Hatsukoi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
&quot;Bungaku Shoujo&quot; Memoire,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0
&quot;Bungaku Shoujo&quot; Movie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Recommendations

We will use cosine similarity

In [8]:
from sklearn.metrics.pairwise import cosine_similarity

In [9]:
item_similarity = cosine_similarity(scipy_piv_ratings)
item_sim_df = pd.DataFrame(item_similarity, index = piv_ratings.index, columns = piv_ratings.index)

In [10]:
def similar_animes(anime_name):
    count = 1
    print('Similar shows to {} include:\n'.format(anime_name))
    for item in item_sim_df.sort_values(by = anime_name, ascending = False).index[1:11]:
        print('No. {}: {}'.format(count, item))
        count +=1  

In [11]:
similar_animes("Naruto")

Similar shows to Naruto include:

No. 1: Death Note
No. 2: Fullmetal Alchemist
No. 3: Bleach
No. 4: Fullmetal Alchemist: Brotherhood
No. 5: Code Geass: Hangyaku no Lelouch
No. 6: Sword Art Online
No. 7: Shingeki no Kyojin
No. 8: Dragon Ball Z
No. 9: Naruto Movie 1: Dai Katsugeki!! Yuki Hime Shinobu Houjou Dattebayo!
No. 10: Ao no Exorcist


In [12]:
similar_animes("Kimi no Na wa.")

Similar shows to Kimi no Na wa. include:

No. 1: Boku dake ga Inai Machi
No. 2: Re:Zero kara Hajimeru Isekai Seikatsu
No. 3: Shigatsu wa Kimi no Uso
No. 4: ReLIFE
No. 5: One Punch Man
No. 6: Charlotte
No. 7: Noragami Aragoto
No. 8: Shokugeki no Souma
No. 9: Yahari Ore no Seishun Love Comedy wa Machigatteiru. Zoku
No. 10: Noragami


In [13]:
similar_animes("Gintama")

Similar shows to Gintama include:

No. 1: Gintama&#039;
No. 2: Gintama&#039;: Enchousen
No. 3: Gintama Movie: Kanketsu-hen - Yorozuya yo Eien Nare
No. 4: Gintama Movie: Shinyaku Benizakura-hen
No. 5: Gintama°
No. 6: Gintama: Shiroyasha Koutan
No. 7: Gintama: Nanigoto mo Saiyo ga Kanjin nano de Tasho Senobisuru Kurai ga Choudoyoi
No. 8: Durarara!!
No. 9: Danshi Koukousei no Nichijou
No. 10: Beelzebub


### Thank you!