# Anime Recommender System

---

### Essential Libraries

Let us begin by importing the essential Python Libraries.

> NumPy : Library for Numeric Computations in Python  
> Pandas : Library for Data Acquisition and Preparation  
> Matplotlib : Low-level library for Data Visualization  
> Seaborn : Higher-level library for Data Visualization  

In [1]:
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt # we only need pyplot
sb.set() # set the default Seaborn style for graphics

---

## Collect data from myanimelist.net API

https://myanimelist.net/apiconfig/references/api/v2

Test with one anime first to see what features are available

In [2]:
# library for fetching api
import requests
import time # to add delay in fetching api

In [3]:
# query for anime
response_query = requests.get("https://api.myanimelist.net/v2/anime?q=Shigatsu&limit=4", 
                        headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})
print("Response:", response_query.status_code)

Response: 200


In [4]:
response_query.json()

{'data': [{'node': {'id': 23273,
    'title': 'Shigatsu wa Kimi no Uso',
    'main_picture': {'medium': 'https://api-cdn.myanimelist.net/images/anime/3/67177.jpg',
     'large': 'https://api-cdn.myanimelist.net/images/anime/3/67177l.jpg'}}},
  {'node': {'id': 28069,
    'title': 'Shigatsu wa Kimi no Uso: Moments',
    'main_picture': {'medium': 'https://api-cdn.myanimelist.net/images/anime/6/74156.jpg',
     'large': 'https://api-cdn.myanimelist.net/images/anime/6/74156l.jpg'}}},
  {'node': {'id': 38091,
    'title': 'Hachigatsu no Cinderella Nine (TV)',
    'main_picture': {'medium': 'https://api-cdn.myanimelist.net/images/anime/1824/100449.jpg',
     'large': 'https://api-cdn.myanimelist.net/images/anime/1824/100449l.jpg'}}},
  {'node': {'id': 5597,
    'title': 'Natsu no Arashi!',
    'main_picture': {'medium': 'https://api-cdn.myanimelist.net/images/anime/5/75256.jpg',
     'large': 'https://api-cdn.myanimelist.net/images/anime/5/75256l.jpg'}}}],
 'paging': {'next': 'https://api.my

In [5]:
# extract anime id and title, did use the anime picture
anime = response_query.json()['data'][0]['node']
anime_id = anime['id']
anime

{'id': 23273,
 'title': 'Shigatsu wa Kimi no Uso',
 'main_picture': {'medium': 'https://api-cdn.myanimelist.net/images/anime/3/67177.jpg',
  'large': 'https://api-cdn.myanimelist.net/images/anime/3/67177l.jpg'}}

In [6]:
# anime details
response_details = requests.get(f'https://api.myanimelist.net/v2/anime/{anime_id}?fields=id,title,main_picture,alternative_titles,start_date,end_date,synopsis,mean,rank,popularity,num_list_users,num_scoring_users,nsfw,created_at,updated_at,media_type,status,genres,my_list_status,num_episodes,start_season,broadcast,source,average_episode_duration,rating,pictures,background,related_anime,related_manga,recommendations,studios,statistics',
                        headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})
print("Response:", response_details.status_code)

Response: 200


In [7]:
print("Features Avaiable:")
print("-----")
for feature in response_details.json().keys():
    print(feature)

Features Avaiable:
-----
id
title
main_picture
alternative_titles
start_date
end_date
synopsis
mean
rank
popularity
num_list_users
num_scoring_users
nsfw
created_at
updated_at
media_type
status
genres
num_episodes
start_season
broadcast
source
average_episode_duration
rating
pictures
background
related_anime
related_manga
recommendations
studios
statistics


---
#### From the features list above, we want to `keep` the following features:
  - id
  - title
  - start_date
  - end_date
  - synopsis
  - mean
  - rank
  - popularity
  - num_list_users
  - num_scoring_users
  - nsfw
  - created_at
  - updated_at
  - media_type
  - status
  - genres
  - num_episodes
  - start_season
  - broadcast
  - source
  - average_episode_duration
  - rating
  - background
  - studios
  - statistics
- **We have `dropped` the following features:**
  - main_picture (image not needed)
  - alternative_titles
  - pictures (image not needed)
  - related_anime
  - related_manga
  - recommendations (recommendations by MAL --> we are building our own recommendations)

*Future response_details queries will only include the listed features above*

---

### Get top ranked animes in various categories, creating a `{ranking_type}_ranking` feature, and store as csv file

In [8]:
ranking_categories = [
    'all',
    'airing',
    'tv',
    'ova',
    'movie',
    'special',
    'bypopularity',
    'favorite'
]

In [9]:
'''
Get the anime ids --> response_ranking.json()['data'][0]['node']['id']
Then get anime details, turn into dataframe and create {ranking_type}_ranking feature
Then store in csv
'''

def create_ranking_csv(ranking_type):
    # getting the top 500 for each category ranking
    response_ranking = requests.get(f'https://api.myanimelist.net/v2/anime/ranking?ranking_type={ranking_type}&limit=500',
                                   headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})
    print(f'Status ({ranking_type})', response_ranking.status_code)


    # querying anime details for each anime in the ranking list
    anime_ranking_list = []
    rank = 0

    for anime in response_ranking.json()['data']:
        rank += 1
        anime_id = anime['node']['id']

        # query for anime details
        response_details = requests.get(f'https://api.myanimelist.net/v2/anime/{anime_id}?fields=id,title,start_date,end_date,synopsis,mean,rank,popularity,num_list_users,num_scoring_users,nsfw,created_at,updated_at,media_type,status,genres,my_list_status,num_episodes,start_season,broadcast,source,average_episode_duration,rating,background,studios,statistics',
                            headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})

        anime_details = response_details.json()
        anime_details[f'{ranking_type}_ranking'] = rank    # adding {ranking_type}_ranking feature
        anime_ranking_list.append(anime_details)

    # convert into a dataframe and store as csv file
    df = pd.DataFrame(anime_ranking_list)
    df.to_csv(f'dataset/rankings/ranking_{ranking_type}.csv', index = False)
    print('csv created!')


In [10]:
'''
for ranking_type in ranking_categories:
    create_ranking_csv(ranking_type)
    
    # add delay for fetching the next api
    time.sleep(180)
'''

'\nfor ranking_type in ranking_categories:\n    create_ranking_csv(ranking_type)\n    \n    # add delay for fetching the next api\n    time.sleep(180)\n'

### Get animes from 2000-2021 (100 animes per season)
By using the MAL "get seasonal anime" api
- Note: each season might not have at least 100 animes

In [11]:
response = requests.get('https://api.myanimelist.net/v2/anime/season/2017/summer?limit=4',
                            headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})

In [12]:
response.json()['data'][0]

{'node': {'id': 36275,
  'title': 'Natsume Yuujinchou Roku Specials',
  'main_picture': {'medium': 'https://api-cdn.myanimelist.net/images/anime/10/87615.jpg',
   'large': 'https://api-cdn.myanimelist.net/images/anime/10/87615l.jpg'}}}

In [13]:
'''
Season name     Months
winter:         January, February, March
spring:         April, May, June
summer          July, August, September
fall            October, November, December
'''
seasons = [
    "winter",
    "spring",
    "summer",
    "fall"
]

In [14]:
def get_anime_season(year, season):
    # fetch 250 animes from a particular {season} of a particular {year}
    response = requests.get(f'https://api.myanimelist.net/v2/anime/season/{year}/{season}?limit=100',
                            headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})
    print(f'Status ({year}/{season})', response.status_code)
    
    
    for anime in response.json()['data']:
        anime_id = anime['node']['id']

        # query for anime details
        response_details = requests.get(f'https://api.myanimelist.net/v2/anime/{anime_id}?fields=id,title,start_date,end_date,synopsis,mean,rank,popularity,num_list_users,num_scoring_users,nsfw,created_at,updated_at,media_type,status,genres,my_list_status,num_episodes,start_season,broadcast,source,average_episode_duration,rating,background,studios,statistics',
                            headers={'X-MAL-CLIENT-ID': '6114d00ca681b7701d1e15fe11a4987e'})

        # add anime details to list
        anime_list.append(response_details.json())
    
    print(f'({year}/{season}) done!')
    print('---')


In [37]:
# List to store animes
'''
anime_list = []
'''

'\nanime_list = []\n'

In [38]:
# get_anime_season(2000, season[0])
'''
for year in range(2000, 2022):
    for season in seasons:
        # fetch api for {year} & {season}
        get_anime_season(year, season)

        # add delay for fetching the next api
        time.sleep(30)


anime_df = pd.DataFrame(anime_list)
anime_df.to_csv('dataset/anime.csv', index = False)
'''

"\nfor year in range(2000, 2022):\n    for season in seasons:\n        # fetch api for {year} & {season}\n        get_anime_season(year, season)\n\n        # add delay for fetching the next api\n        time.sleep(30)\n\n\nanime_df = pd.DataFrame(anime_list)\nanime_df.to_csv('dataset/anime.csv', index = False)\n"

In [36]:
pd.read_csv('dataset/anime.csv').nunique()

id                          5608
title                       5608
main_picture                5608
start_date                  3235
end_date                    3224
synopsis                    5452
mean                         398
rank                        5568
popularity                  5588
num_list_users              5042
num_scoring_users           4349
nsfw                           1
created_at                  5608
updated_at                  5590
media_type                     6
status                         2
genres                      2038
num_episodes                 140
start_season                 108
broadcast                    367
source                        17
average_episode_duration    1306
rating                         5
background                   822
studios                      597
statistics                  5620
dtype: int64

In [39]:
pd.read_csv('dataset/anime.csv').head()

Unnamed: 0,id,title,main_picture,start_date,end_date,synopsis,mean,rank,popularity,num_list_users,...,genres,num_episodes,start_season,broadcast,source,average_episode_duration,rating,background,studios,statistics
0,95,Turn A Gundam,{'medium': 'https://api-cdn.myanimelist.net/im...,1999-04-09,2000-04-14,"It is the Correct Century, two millennia after...",7.71,1049,2892,40743,...,"[{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...",50,"{'year': 1999, 'season': 'spring'}","{'day_of_the_week': 'friday', 'start_time': '1...",original,1445,pg_13,,"[{'id': 14, 'name': 'Sunrise'}, {'id': 1260, '...","{'status': {'watching': '2735', 'completed': '..."
1,3665,Ginga Eiyuu Densetsu Gaiden (1999),{'medium': 'https://api-cdn.myanimelist.net/im...,1999-12-24,2000-07-21,Ginga Eiyuu Densetsu Gaiden (1999) is the seco...,8.07,472,4347,17849,...,"[{'id': 1, 'name': 'Action'}, {'id': 8, 'name'...",28,"{'year': 1999, 'season': 'fall'}",,novel,1560,r,,"[{'id': 8, 'name': 'Artland'}, {'id': 207, 'na...","{'status': {'watching': '814', 'completed': '8..."
2,2471,Doraemon (1979),{'medium': 'https://api-cdn.myanimelist.net/im...,1979-04-02,2005-03-18,Nobita Nobi is a normal fourth grade student. ...,7.74,976,2553,51255,...,"[{'id': 2, 'name': 'Adventure'}, {'id': 4, 'na...",1787,"{'year': 1979, 'season': 'spring'}",,manga,660,pg,,"[{'id': 247, 'name': 'Shin-Ei Animation'}]","{'status': {'watching': '4637', 'completed': '..."
3,21,One Piece,{'medium': 'https://api-cdn.myanimelist.net/im...,1999-10-20,,"Gol D. Roger was known as the ""Pirate King,"" t...",8.63,66,26,1812581,...,"[{'id': 1, 'name': 'Action'}, {'id': 2, 'name'...",0,"{'year': 1999, 'season': 'fall'}","{'day_of_the_week': 'sunday', 'start_time': '0...",manga,1440,pg_13,Several anime-original arcs have been adapted ...,"[{'id': 18, 'name': 'Toei Animation'}]","{'status': {'watching': '1227452', 'completed'..."
4,2397,Digimon Adventure: Bokura no War Game!,{'medium': 'https://api-cdn.myanimelist.net/im...,2000-03-04,2000-03-04,This movie takes place after the Adventure ser...,7.77,924,2135,70125,...,"[{'id': 2, 'name': 'Adventure'}, {'id': 4, 'na...",1,"{'year': 2000, 'season': 'winter'}",,original,2460,pg,Digimon Adventure: Bokura no War Game was aire...,"[{'id': 18, 'name': 'Toei Animation'}]","{'status': {'watching': '653', 'completed': '6..."


In [3]:
df = pd.read_csv('dataset/rankings/ranking_airing.csv')
df.head()

Unnamed: 0,id,title,main_picture,start_date,end_date,synopsis,mean,rank,popularity,num_list_users,...,num_episodes,start_season,broadcast,source,average_episode_duration,rating,background,studios,statistics,airing_ranking
0,48583,Shingeki no Kyojin: The Final Season Part 2,{'medium': 'https://api-cdn.myanimelist.net/im...,2022-01-10,2022-04-04,Turning against his former allies and enemies ...,9.08,5.0,190,736361,...,12,"{'year': 2022, 'season': 'winter'}","{'day_of_the_week': 'monday', 'start_time': '0...",manga,1435,r,,"[{'id': 569, 'name': 'MAPPA'}]","{'status': {'watching': '475842', 'completed':...",1
1,40834,Ousama Ranking,{'medium': 'https://api-cdn.myanimelist.net/im...,2021-10-15,2022-03-25,The people of the kingdom look down on the you...,8.79,31.0,574,318964,...,23,"{'year': 2021, 'season': 'fall'}","{'day_of_the_week': 'friday', 'start_time': '0...",web_manga,1370,pg_13,The sign language depicted in Ousama Ranking i...,"[{'id': 858, 'name': 'Wit Studio'}]","{'status': {'watching': '205453', 'completed':...",2
2,21,One Piece,{'medium': 'https://api-cdn.myanimelist.net/im...,1999-10-20,,"Gol D. Roger was known as the ""Pirate King,"" t...",8.63,66.0,26,1811880,...,0,"{'year': 1999, 'season': 'fall'}","{'day_of_the_week': 'sunday', 'start_time': '0...",manga,1440,pg_13,Several anime-original arcs have been adapted ...,"[{'id': 18, 'name': 'Toei Animation'}]","{'status': {'watching': '1226947', 'completed'...",3
3,48661,JoJo no Kimyou na Bouken Part 6: Stone Ocean,{'medium': 'https://api-cdn.myanimelist.net/im...,2021-12-01,,"In Florida, 2011, Jolyne Kuujou sits in a jail...",8.54,97.0,606,304935,...,0,"{'year': 2021, 'season': 'fall'}",,manga,1469,r,,"[{'id': 287, 'name': 'David Production'}]","{'status': {'watching': '167062', 'completed':...",4
4,49721,Karakai Jouzu no Takagi-san 3,{'medium': 'https://api-cdn.myanimelist.net/im...,2022-01-08,2022-03-26,"As summer break comes to an end, Nishikata is ...",8.46,133.0,1581,109105,...,12,"{'year': 2022, 'season': 'winter'}","{'day_of_the_week': 'saturday', 'start_time': ...",manga,1448,pg_13,,"[{'id': 247, 'name': 'Shin-Ei Animation'}]","{'status': {'watching': '54138', 'completed': ...",5


In [30]:
import json

In [31]:
for row in range(0, len(df)):
    # convert from string to json
    start_season = json.loads(df['start_season'][row].replace("'", "\""))
    
    # to access the different keys
    print('start_season_year: \t', start_season['year'])
    print('start_season_season: \t', start_season['season'])
    
    # break here for now to see 1 entry
    break

start_season_year: 	 2022
start_season_season: 	 winter
