### Content-Based recommender system for Anime Series

This recommender is created from content in the imdb sites for top anime series. Source: (https://www.imdb.com/search/keyword/?keywords=anime&mode=advanced&page=1&ref_=kw_nxt&sort=moviemeter,asc)
    
The metadata includes title, rank, score, synopsis and genre.

The web scraping python script is created in order to produce the relevant excel sheet.

### Import relevant packages and tools

In [192]:
import pandas as pd
from rake_nltk import Rake
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

### Step 1: import data and analyse its content

In [193]:
df = pd.read_csv('animeRatings.csv')
df.head()

Unnamed: 0,Rank,Score,Title,Genre,Synopsis
0,1,9.3,Avatar: The Last Airbender,"Animation, Action, Adventure","In a war-torn world of elemental magic, a youn..."
1,2,9.3,Gintama: The Final,"Animation, Action, Comedy",Add a Plot
2,3,9.3,Chris Chan: A Comprehensive History,"Documentary, Biography, History",A documentary series about Sonichu creator and...
3,4,9.2,Gintama: The Semi-Final,"Animation, Action, Comedy",A quick look at everyone in Gintama before the...
4,5,9.1,Hagane no renkinjutsushi,"Animation, Action, Adventure",Two brothers search for a Philosopher's Stone ...


In [194]:
df.tail()

Unnamed: 0,Rank,Score,Title,Genre,Synopsis
195,196,8.3,Berusaiyu no bara,"Animation, Drama, Romance","The story of Lady Oscar, a female military com..."
196,197,8.3,UFO robo: Gurendaizâ,"Animation, Action, Romance","Escaping from Vega's evil forces, the young Pr..."
197,198,8.3,Hikaru no go,"Animation, Fantasy, Sport",A shounen style anime based around the Japanes...
198,199,8.3,Hajime no ippo - Champion road,"Animation, Action, Comedy","As the new Champion, Ippo now must protect his..."
199,200,8.3,Kôkaku kidôtai: Stand alone complex - The laug...,"Animation, Action, Drama",A compilation movie featuring scenes from Ghos...


#### identified that are there 2 Null values. Locate the null values and see

In [195]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Rank      200 non-null    int64  
 1   Score     200 non-null    float64
 2   Title     200 non-null    object 
 3   Genre     198 non-null    object 
 4   Synopsis  200 non-null    object 
dtypes: float64(1), int64(1), object(3)
memory usage: 7.9+ KB


In [196]:
df1 = df[df.isna().any(axis=1)]

In [197]:
df1

Unnamed: 0,Rank,Score,Title,Genre,Synopsis
58,59,8.7,Arubâto odessei gaiden: Legend of Eldean,,Add a Plot
159,160,8.4,Dragon Force,,Add a Plot


#### Notice that genre is not specified, hence that is not an issue. Some rows do not have synopsis as well, but it is hard to impute data for synopsis hence we can just continue on.

### Step 2a: Data pre-processing Transforming Genre to a list of genres

In [198]:
df['Genre'] = df['Genre'].map(lambda x: str(x).split(','))


### Step 2b: Data pre-processing on synopsis. Extracting the key words from the synopsis description.

In [199]:

key_words = []
for index, row in df.iterrows():
    synopsis = row['Synopsis']
    
    # instantiating Rake, by default is uses english stopwords from NLTK
    # and discard all puntuation characters
    r = Rake()

    # extracting the words by passing the text
    r.extract_keywords_from_text(synopsis)

    # getting the dictionary whith key words and their scores
    key_words_dict_scores = r.get_word_degrees()
    
    # assigning the key words to the new column
    key_words.append(list(key_words_dict_scores.keys()))
    

df.drop(columns = ['Synopsis'], inplace = True)
df["Key_words"] = key_words

In [200]:
df.head()

Unnamed: 0,Rank,Score,Title,Genre,Key_words
0,1,9.3,Avatar: The Last Airbender,"[Animation, Action, Adventure]","[fulfill, world, elemental, magic, dangerous, ..."
1,2,9.3,Gintama: The Final,"[Animation, Action, Comedy]","[add, plot]"
2,3,9.3,Chris Chan: A Comprehensive History,"[Documentary, Biography, History]","[documentary, series, internet, sensation, chr..."
3,4,9.2,Gintama: The Semi-Final,"[Animation, Action, Comedy]","[finale, everyone, quick, look, gintama]"
4,5,9.1,Hagane no renkinjutsushi,"[Animation, Action, Adventure]","[two, brothers, search, philosopher, revive, d..."


### Step 3: Create word representation - via bag of words using the values from the columns

In [201]:
bag_of_words= []
#Title, rank & score should be omitted from bag of words creation
columns = df.columns[3:]

for index, row in df.iterrows():
    words = ''
    for col in columns:
        words = words + ' '.join(row[col])+ ' '
    bag_of_words.append(words)
    
# print(bag_of_words)
df["Bag_of_Words"] = bag_of_words
df = df[['Title','Bag_of_Words']]

In [202]:
df.head()

Unnamed: 0,Title,Bag_of_Words
0,Avatar: The Last Airbender,Animation Action Adventure fulfill world ele...
1,Gintama: The Final,Animation Action Comedy add plot
2,Chris Chan: A Comprehensive History,Documentary Biography History documentary se...
3,Gintama: The Semi-Final,Animation Action Comedy finale everyone quic...
4,Hagane no renkinjutsushi,Animation Action Adventure two brothers sear...


### Step 4: Create the model using count metrics

In [203]:
# instantiating and generating the count matrix
count = CountVectorizer()
count_matrix = count.fit_transform(df['Bag_of_Words'])

# creating a Series for the movie titles so they are associated to an ordered numerical
# list that can be used to match the indexes
indices = pd.Series(df['Title'])
indices[:5]

0             Avatar: The Last Airbender
1                     Gintama: The Final
2    Chris Chan: A Comprehensive History
3                Gintama: The Semi-Final
4               Hagane no renkinjutsushi
Name: Title, dtype: object

**Generating the consine similarity matrix**

In [204]:
# generating the cosine similarity matrix
cosine_sim = cosine_similarity(count_matrix, count_matrix)
print(cosine_sim)

[[1.         0.2        0.         ... 0.06741999 0.12909944 0.12909944]
 [0.2        1.         0.         ... 0.13483997 0.38729833 0.25819889]
 [0.         0.         1.         ... 0.         0.         0.        ]
 ...
 [0.06741999 0.13483997 0.         ... 1.         0.08703883 0.08703883]
 [0.12909944 0.38729833 0.         ... 0.08703883 1.         0.16666667]
 [0.12909944 0.25819889 0.         ... 0.08703883 0.16666667 1.        ]]


### Step 5: Test and run the model (recommender)

**Create the recommendation function**

In [205]:
# function that takes in movie title as input and returns the top 10 recommended movies
def recommendations(title, cosine_sim = cosine_sim):
    
    recommended_movies = []
    
    # getting the index of the movie that matches the title
    idx = indices[indices == title].index[0]

    # creating a Series with the similarity scores in descending order
    score_series = pd.Series(cosine_sim[idx]).sort_values(ascending = False)

    # getting the indexes of the 10 most similar movies
    top_10_indexes = list(score_series.iloc[1:11].index)
    
    # populating the list with the titles of the best 10 matching movies
    for i in top_10_indexes:
        recommended_movies.append(list(df['Title'])[i])
        
    return recommended_movies

In [187]:
recommendations('Gintama: The Final')

['Girls und Panzer das Finale: Part III',
 'Strongbad_email.exe: Disc Four',
 'The Legend of the Galactic Heroes: Die Neue These - Seiran 3',
 'Uchûsen Sajitteriasu',
 'Arubâto odessei gaiden: Legend of Eldean',
 'Dragon Force',
 'Zorori the Naughty Hero: The secret of ZZ',
 'Gintama: The Semi-Final',
 'One Punch Man: Wanpanman',
 'Initial D: First Stage']

#### Jujutsu Kaisen was such a massive hit and i really enjoyed it. Now im suffering from jujutsu kaisen withdrawl symptoms and i'm looking for another anime similar to it

In [206]:
recommendations('Jujutsu Kaisen')

['Initial D: First Stage',
 'Gekijouban Poketto monsutâ: koko',
 'Boku no hîrô akademia',
 'Rurôni Kenshin - Meiji kenkaku romantan',
 'Meitantei Conan',
 'Kaubôi bibappu',
 'Doragon kuesuto: Dai no daibouken',
 'Rurouni Kenshin: Meiji Kenkaku Romantan: Tsuioku Hen',
 'Kyaputen Tsubasa',
 'Shingeki no Kyojin: Chronicle']

**Output shows the recommendations based on the input sorted from descending rating**

In [207]:
recommendations('Boku no hîrô akademia')

['Doragon kuesuto: Dai no daibouken',
 'Gekijouban Poketto monsutâ: koko',
 'Invincible',
 'Meitantei Conan',
 'Kaubôi bibappu',
 'Shingeki no Kyojin: Chronicle',
 'Kyaputen Tsubasa',
 'Rurouni Kenshin: Meiji Kenkaku Romantan: Tsuioku Hen',
 'Hunter x Hunter',
 'Ninja Senshi Tobikage']