# Cosine TF-IDF (Term Frequency-Inverse Document Frequency) similarity

TF-IDF is measure of how frequent a term appears in a text and how frequent the term appears across the collection of documents.

The TF-IDF score multiplies TF x IDF values. A higher score means the term is more significant.

After calculating the TF-IDF score, we take the cosine of the angle between the sentences and the terms.

In [2]:
import pandas as pd
import numpy as np

df_anime = pd.read_csv('../data/anime-dataset-2023.csv')
df_anime['Synopsis'].head()

0    Crime is timeless. By the year 2071, humanity ...
1    Another day, another bounty—such is the life o...
2    Vash the Stampede is the man with a $$60,000,0...
3    Robin Sena is a powerful craft user drafted in...
4    It is the dark century and the people are suff...
Name: Synopsis, dtype: object

In [5]:
#create the tf-idf matrix for text comparison
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df_anime['Synopsis'])

In [8]:
# Compute cosine similarity between all anime synopsis
from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(tfidf_matrix)
similarity_df = pd.DataFrame(similarity, 
                             index=df_anime['Name'], 
                             columns=df_anime['Name'])
similarity_df.head(10)

Name,Cowboy Bebop,Cowboy Bebop: Tengoku no Tobira,Trigun,Witch Hunter Robin,Bouken Ou Beet,Eyeshield 21,Hachimitsu to Clover,Hungry Heart: Wild Striker,Initial D Fourth Stage,Monster,...,"Die, Please!",Miru,Wo Mengjian ni Mengjian wo,Thailand,Energy,Wu Nao Monu,Bu Xing Si: Yuan Qi,Di Yi Xulie,Bokura no Saishuu Sensou,Shijuuku Nichi
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Cowboy Bebop,1.0,0.265262,0.020053,0.040867,0.001554,0.017027,0.0,0.005313,0.0,0.009678,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Cowboy Bebop: Tengoku no Tobira,0.265262,1.0,0.038163,0.016617,0.004106,0.022083,0.011153,0.012127,0.008221,0.013432,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Trigun,0.020053,0.038163,1.0,0.005122,0.012405,0.00891,0.003156,0.0,0.0,0.023464,...,0.008295,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Witch Hunter Robin,0.040867,0.016617,0.005122,1.0,0.014875,0.121858,0.0,0.014676,0.008,0.0,...,0.003352,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Bouken Ou Beet,0.001554,0.004106,0.012405,0.014875,1.0,0.056452,0.001965,0.0,0.0,0.009695,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Eyeshield 21,0.017027,0.022083,0.00891,0.121858,0.056452,1.0,0.010987,0.013827,0.017035,0.011004,...,0.007967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hachimitsu to Clover,0.0,0.011153,0.003156,0.0,0.001965,0.010987,1.0,0.0,0.025009,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Hungry Heart: Wild Striker,0.005313,0.012127,0.0,0.014676,0.0,0.013827,0.0,1.0,0.026522,0.0,...,0.020025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Initial D Fourth Stage,0.0,0.008221,0.0,0.008,0.0,0.017035,0.025009,0.026522,1.0,0.0,...,0.016066,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Monster,0.009678,0.013432,0.023464,0.0,0.009695,0.011004,0.0,0.0,0.0,1.0,...,0.01637,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
# anime list 
anime_list = similarity_df.columns.values


# sample anime
anime = 'InuYasha'

# top recommendation movie count
top_n = 10

# get anime similarity records
anime_sim = similarity_df[similarity_df.index == anime].values[0]

# get animes sorted by similarity
sorted_anime_ids = np.argsort(anime_sim)[::-1]

# get recommended anime names
recommended_anime = anime_list[sorted_anime_ids[1:top_n+1]]

print('\n\nTop Recommended Anime for:', anime, 'are:-\n', recommended_anime)



Top Recommended Anime for: InuYasha are:-
 ['InuYasha Movie 1: Toki wo Koeru Omoi' 'InuYasha: Kanketsu-hen'
 'Shounen Sunday CM: InuYasha-hen' 'InuYasha Movie 3: Tenka Hadou no Ken'
 'InuYasha Movie 4: Guren no Houraijima' 'InuYasha: Kuroi Tessaiga'
 'Kiratto Pri☆chan Season 2' 'Jewelpet Twinkle☆'
 'InuYasha Movie 2: Kagami no Naka no Mugenjo'
 'Shounen Sunday CM: Kyoukai no Rinne']


In [19]:
def content_anime_recommender(
    input_anime, similarity_database=similarity_df, anime_database_list=anime_list, top_n=10):
    
    # get anime similarity records
    anime_sim = similarity_database[similarity_database.index == input_anime].values[0]
    
    # get anime sorted by similarity
    sorted_anime_ids = np.argsort(anime_sim)[::-1]
    
    # get recommended anime names
    recommended_anime = anime_database_list[sorted_anime_ids[1:top_n+1]]
    
    print('\n\nTop Recommended Anime for:', input_anime, 'are:-\n', recommended_anime)

sample_anime = ['Death Note', 'Cowboy Bebop', 'Bleach', 
                 'Fruits Basket', 'Monster']
                 
for i in sample_anime:
    content_anime_recommender(i)



Top Recommended Anime for: Death Note are:-
 ['Death Note: Rewrite' 'Mugen no Hi' 'Sekaikei Sekai Ron' 'gdMen'
 'WONDER LiGHT' 'JK to Ero Giin Sensei' 'Dia Horizon (Kabu)'
 'Ore no Nounai Sentakushi ga, Gakuen Love Comedy wo Zenryoku de Jama Shiteiru OVA'
 'JK to Ero Konbini Tenchou' 'Ji Jia Shou Shen: Baolie Feiche']


Top Recommended Anime for: Cowboy Bebop are:-
 ['Cowboy Bebop: Tengoku no Tobira' 'Cowboy Bebop: Ein no Natsuyasumi'
 'Saru Getchu Movie: Ougon no Pipo Helmet - Ukki Battle'
 'Kurogane Communication' 'Kandagawa Jet Girls Recap'
 'Kandagawa Jet Girls' 'Phantasy Star Online 2: Episode Oracle'
 'Umeboshi Denka' 'Bounty Hunter: The Hard'
 'Saraba Uchuu Senkan Yamato: Ai no Senshi-tachi']


Top Recommended Anime for: Bleach are:-
 ['Bleach: Sennen Kessen-hen'
 'Bleach Movie 3: Fade to Black - Kimi no Na wo Yobu'
 'Bleach Movie 1: Memories of Nobody' 'Bleach Movie 4: Jigoku-hen'
 'Yume-iro Pâtissière SP Professional' 'Tokyo Mew Mew New ♡'
 'Aikatsu! Movie' 'Tokyo Mew Mew' '