<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>📜 Introduction : </font></h3>
    
* This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings.

# Content

1. [Importing & Reading Data](#1)
1. [Association Rule Learning](#2)
    * [Data Preprocessing](#3)
    * [Anime-User Matrix](#4)
    * [Association Rules Analysis](#5)
    * [Product Recommendation](#6)
1. [Content Based Filtering](#7)
    * [Creating the TF-IDF Matrix](#8)
    * [Cosine Sim Calculator](#9)
    * [Recommendation Based on Similarities](#10)

<a id="1"></a>
<h1 style="border-radius: 10px; border: 2px solid #6B8E23; background-color: #F5F5DC; font-family: 'Pacifico', cursive; font-size: 200%; text-align: center; border-radius: 15px 50px; padding: 15px; box-shadow: 5px 5px 5px #556B2F; color: #556B2F;">Importing & Reading Data</h1>

In [1]:
import numpy as np
import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules

import warnings
warnings.filterwarnings("ignore")

In [2]:
anime = pd.read_csv("/kaggle/input/anime-recommendations-database/anime.csv")
rating = pd.read_csv("/kaggle/input/anime-recommendations-database/rating.csv")

In [3]:
anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [4]:
rating.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [5]:
anime.rename(columns= {"rating" : "animerating"}, inplace=True)
rating.rename(columns= {"rating" : "userrating"}, inplace=True)

rating = rating[rating["userrating"] != -1]

<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>👀 Features : </font></h3>
    
* **Anime.csv**
    * **anime_id** - myanimelist.net's unique id identifying an anime.
    * **name** - full name of anime.
    * **genre** - comma separated list of genres for this anime.
    * **type** - movie, TV, OVA, etc.
    * **episodes** - how many episodes in this show. (1 if movie).
    * **rating** - average rating out of 10 for this anime.
    * **members** - number of community members that are in this anime's "group".   
* **Rating.csv**
    * **user_id** - non identifiable randomly generated user id.
    * **anime_id** - the anime that this user has rated.
    * **rating** - rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).

In [6]:
df = pd.merge(anime,rating, how="inner", on="anime_id")

df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,animerating,members,user_id,userrating
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,99,5
1,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,152,10
2,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,244,10
3,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,271,10
4,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630,322,10


In [7]:
df.shape # 7814824 rating, 9 columns

(6337239, 9)

In [8]:
df.isnull().sum()

anime_id        0
name            0
genre          88
type            4
episodes        0
animerating     5
members         0
user_id         0
userrating      0
dtype: int64

<a id="2"></a>
<h1 style="border-radius: 10px; border: 2px solid #6B8E23; background-color: #F5F5DC; font-family: 'Pacifico', cursive; font-size: 200%; text-align: center; border-radius: 15px 50px; padding: 15px; box-shadow: 5px 5px 5px #556B2F; color: #556B2F;">Association Rule Learning</h1>

<a id = "3"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Data Preprocessing✨</p>

In [9]:
df["name"].nunique() # 3048 unique anime

9926

In [10]:
df["name"].value_counts()

name
Death Note                                            34226
Sword Art Online                                      26310
Shingeki no Kyojin                                    25290
Code Geass: Hangyaku no Lelouch                       24126
Angel Beats!                                          23565
                                                      ...  
Manga Doushite Monogatari                                 1
Manabu no Natsuyasumi                                     1
Mameshi-Pamyu-Pamyu                                       1
Maku                                                      1
Violence Gekiga Shin David no Hoshi: Inma Densetsu        1
Name: count, Length: 9926, dtype: int64

In [11]:
import statsmodels.stats.api as sms

low_conf, up_conf = sms.DescrStatsW(df["anime_id"].value_counts()).tconfint_mean()

print(f"Lower Confidence Interval: {low_conf:.0f}")
print(f"Upper Confidence Interval: {up_conf:.0f}")

Lower Confidence Interval: 603
Upper Confidence Interval: 674


<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
A confidence interval represents a range of values within which we can make estimates about a specific property or measurement in a dataset. It has two important components: the lower confidence interval and the upper confidence interval.

These confidence intervals are typically calculated based on a specific level of confidence. This confidence level indicates how confident an analyst or statistician wants to be in the estimate. For example, a 95% confidence level is commonly used, and a confidence interval calculated at this level means that there is a 95% probability that the true value of the data falls within this interval.

603 and 674 represent the confidence interval for a particular dataset's property or measurement. For instance, if these two numbers represent the confidence interval for an income distribution, it would mean that with a certain confidence level (e.g., 95%), the income is expected to be between 603 and 674 units.

In [12]:
rating_counts = pd.DataFrame(df["anime_id"].value_counts())
rare_animes = rating_counts[rating_counts["count"] < low_conf].index
common_anime = df[~df["anime_id"].isin(rare_animes)]

common_anime["name"].nunique()

2007

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
We removed the ratings that are lower than the Lower Confidence Interval.

<a id = "4"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Anime-User Matrix✨</p>

In [13]:
user_anime_matrix = common_anime.groupby(["user_id","anime_id"])["animerating"].count().unstack().notnull()

user_anime_matrix

anime_id,1,5,6,7,15,16,18,19,20,22,...,32668,32681,32729,32828,32935,32998,33028,33558,34103,34240
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,False
5,False,False,True,False,True,False,True,False,True,True,...,False,False,False,True,False,False,False,False,False,False
7,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73512,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
73513,True,True,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
73514,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
73515,True,True,True,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False


<a id = "5"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Association Rules Analysis✨</p>

In [14]:
frequent_itemsets = apriori(user_anime_matrix,min_support=0.1,use_colnames=True,low_memory=True)

frequent_itemsets.sort_values("support", ascending=False)

rules = association_rules(frequent_itemsets,metric="support",min_threshold=0.01)

<a id = "6"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Product Recommendation✨</p>

In [15]:
def arl_recommender(rules_df, product_id, rec=1):
    
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    
    recommendation_list = []
    
    for i, product in enumerate(sorted_rules["antecedents"]):
        
        for j in list(product):
            
            if j == product_id:
                
                for k in list(sorted_rules.iloc[i]["consequents"]):
                
                    if k not in recommendation_list:
                        
                        recommendation_list.append(k)

    return recommendation_list[0:rec]

In [16]:
common_anime[["anime_id","name"]][common_anime["name"].str.contains("Naruto$")].drop_duplicates().head(5)

Unnamed: 0,anime_id,name
2744301,20,Naruto


In [17]:
suggest_list = arl_recommender(rules,20,5)

In [18]:
def check_id(data,id):
    
    name = data["name"][data["anime_id"] == id].iloc[0]
    
    return name

In [19]:
for suggest in suggest_list:
    
    print(check_id(anime, id = suggest))

Code Geass: Hangyaku no Lelouch R2
Death Note
Code Geass: Hangyaku no Lelouch
Sword Art Online
Shingeki no Kyojin


<a id="7"></a>
<h1 style="border-radius: 10px; border: 2px solid #6B8E23; background-color: #F5F5DC; font-family: 'Pacifico', cursive; font-size: 200%; text-align: center; border-radius: 15px 50px; padding: 15px; box-shadow: 5px 5px 5px #556B2F; color: #556B2F;">Content Based Filtering</h1>

<a id = "8"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Creating the TF-IDF Matrix✨</p>

In [20]:
import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import warnings
warnings.filterwarnings("ignore")

In [21]:
anime = pd.read_csv("/kaggle/input/anime-recommendations-database/anime.csv")

In [22]:
anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [23]:
anime.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [24]:
anime.dropna(subset=['genre'], inplace=True)

In [25]:
anime["genre"].head()

0                 Drama, Romance, School, Supernatural
1    Action, Adventure, Drama, Fantasy, Magic, Mili...
2    Action, Comedy, Historical, Parody, Samurai, S...
3                                     Sci-Fi, Thriller
4    Action, Comedy, Historical, Parody, Samurai, S...
Name: genre, dtype: object

In [26]:
tfidf = TfidfVectorizer(stop_words="english")

In [27]:
anime['genre'] = anime['genre'].fillna('')

In [28]:
tfidf_matrix = tfidf.fit_transform(anime['genre'])

In [29]:
tfidf_matrix.shape # 12294 unique anime

(12232, 46)

In [30]:
feature_names = tfidf.get_feature_names_out()

feature_names

array(['action', 'adventure', 'ai', 'arts', 'cars', 'comedy', 'dementia',
       'demons', 'drama', 'ecchi', 'fantasy', 'fi', 'game', 'harem',
       'hentai', 'historical', 'horror', 'josei', 'kids', 'life', 'magic',
       'martial', 'mecha', 'military', 'music', 'mystery', 'parody',
       'police', 'power', 'psychological', 'romance', 'samurai', 'school',
       'sci', 'seinen', 'shoujo', 'shounen', 'slice', 'space', 'sports',
       'super', 'supernatural', 'thriller', 'vampire', 'yaoi', 'yuri'],
      dtype=object)

In [31]:
tfidf_matrix.toarray() # # the scores at the intersection of documents and terms.

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.29450574, 0.31749916, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.25046406, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

<a id = "9"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Cosine Sim Calculator✨</p>

In [32]:
cosine_sim = cosine_similarity(tfidf_matrix,
                               tfidf_matrix)

In [33]:
cosine_sim.shape

(12232, 12232)

<a id = "10"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Recommendation Based on Similarities✨</p>

In [34]:
indices = pd.Series(anime.index, index=anime['name'])

In [35]:
indices = indices[~indices.index.duplicated(keep='last')]

In [36]:
movie_index = indices["One Piece"]

In [37]:
cosine_sim[movie_index]

array([0.14065349, 0.5001569 , 0.21489077, ..., 0.        , 0.        ,
       0.        ])

In [38]:
similarity_scores = pd.DataFrame(cosine_sim[movie_index],
                                 columns=["score"])

In [39]:
movie_indices = similarity_scores.sort_values("score", ascending=False)[1:11].index

In [40]:
anime['name'].iloc[movie_indices]

241     One Piece: Episode of Nami - Koukaishi no Nami...
74                                              One Piece
896     One Piece: Episode of Sabo - 3 Kyoudai no Kizu...
2723    One Piece Movie 3: Chinjuu-jima no Chopper Oukoku
1793                 One Piece Movie 5: Norowareta Seiken
352                One Piece Film: Strong World Episode 0
753     One Piece: Episode of Luffy - Hand Island no B...
2161                                      One Piece Recap
1795              One Piece: Umi no Heso no Daibouken-hen
1171    One Piece Movie 9: Episode of Chopper Plus - F...
Name: name, dtype: object

<center><img src="https://i.imgur.com/LLXhQ2c.jpg" width="800" height="800"></center>