<h1 align="center">ðŸŽŒ Anime Feature Extraction ðŸŽŒ</h1>

<h1 align="center">Introduction</h1>

This project focuses on extracting meaningful features from an anime dataset to prepare it for machine learning tasks such as recommendation systems or classification.

The objective is to transform raw categorical, textual, and numerical data into structured, ML-ready features using Python.


### Importing Tools

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plot
import seaborn as sn

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', None)

### Reading Data

In [2]:
data=pd.read_csv('anime.csv')
data.head()

Unnamed: 0,Rank,Title,Score
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07
2,3,"Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members",9.06
3,4,"GintamaÂ°TV (51 eps)Apr 2015 - Mar 2016605,113 members",9.06
4,5,"Shingeki no Kyojin Season 3 Part 2TV (10 eps)Apr 2019 - Jul 20192,146,679 membersManga StoreVolume 1â‚¬10.99Preview",9.05


### Info of given data

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Rank    50 non-null     int64  
 1   Title   50 non-null     object 
 2   Score   50 non-null     float64
dtypes: float64(1), int64(1), object(1)
memory usage: 1.3+ KB


### Extracting Episode Count

In [4]:
def extract_ep(title):
    str(title)
    ep=""
    for i in range(len(title)):
        if(title[i]=='('):
            i+=1
            while(title[i]!=')'):
                ep+=title[i]
                i+=1
            break
        else: continue
    return int(ep.split(' ')[0])

In [5]:
data['Episode_Count']=data['Title'].apply(extract_ep)

In [6]:
data.head(3)

Unnamed: 0,Rank,Title,Score,Episode_Count
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1,64
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07,24
2,3,"Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members",9.06,13


In [7]:
data.head(10)['Title']

0       Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview
1                                                           Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members
2                                               Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members
3                                                                GintamaÂ°TV (51 eps)Apr 2015 - Mar 2016605,113 members
4    Shingeki no Kyojin Season 3 Part 2TV (10 eps)Apr 2019 - Jul 20192,146,679 membersManga StoreVolume 1â‚¬10.99Preview
5                                                                Gintama'TV (51 eps)Apr 2011 - Mar 2012534,105 members
6                                                    Gintama: The FinalMovie (1 eps)Jan 2021 - Jan 2021137,208 members
7                                                     Hunter x Hunter TV (148 eps)Oct 2011 - Sep 20142,701,154 members
8                              Kaguya-sama 

### Extracting Time Stamp

In [8]:
def extract_TS(title):
    TS=""
    for i in range(len(title)):
        if(title[i]==')'):
            i+=1
            while(title[i]!=','):
                if(len(TS)<19):
                    TS+=title[i]
                i+=1
            break;
        else: continue
    return TS;  

In [9]:
data['Time_Stamp']=data['Title'].apply(extract_TS)

In [10]:
data.head(4)

Unnamed: 0,Rank,Title,Score,Episode_Count,Time_Stamp
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1,64,Apr 2009 - Jul 2010
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07,24,Apr 2011 - Sep 2011
2,3,"Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members",9.06,13,Oct 2022 - Dec 2022
3,4,"GintamaÂ°TV (51 eps)Apr 2015 - Mar 2016605,113 members",9.06,51,Apr 2015 - Mar 2016


In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Rank           50 non-null     int64  
 1   Title          50 non-null     object 
 2   Score          50 non-null     float64
 3   Episode_Count  50 non-null     int64  
 4   Time_Stamp     50 non-null     object 
dtypes: float64(1), int64(2), object(2)
memory usage: 2.1+ KB


In [12]:
data['Time_Stamp']=data['Time_Stamp'].astype(str)

### Time Stamp -> Start & End

In [13]:
data["start"]=data['Time_Stamp'].str.split('-').str[0]
data["End"]=data['Time_Stamp'].str.split('-').str[1]

In [14]:
data.head()

Unnamed: 0,Rank,Title,Score,Episode_Count,Time_Stamp,start,End
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1,64,Apr 2009 - Jul 2010,Apr 2009,Jul 2010
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07,24,Apr 2011 - Sep 2011,Apr 2011,Sep 2011
2,3,"Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members",9.06,13,Oct 2022 - Dec 2022,Oct 2022,Dec 2022
3,4,"GintamaÂ°TV (51 eps)Apr 2015 - Mar 2016605,113 members",9.06,51,Apr 2015 - Mar 2016,Apr 2015,Mar 2016
4,5,"Shingeki no Kyojin Season 3 Part 2TV (10 eps)Apr 2019 - Jul 20192,146,679 membersManga StoreVolume 1â‚¬10.99Preview",9.05,10,Apr 2019 - Jul 2019,Apr 2019,Jul 2019


### Deletion of Intermediate Result

In [15]:
data.drop('Time_Stamp',axis=1,inplace=True)

In [16]:
data.head(3)

Unnamed: 0,Rank,Title,Score,Episode_Count,start,End
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1,64,Apr 2009,Jul 2010
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07,24,Apr 2011,Sep 2011
2,3,"Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members",9.06,13,Oct 2022,Dec 2022


### Extracting Months

In [17]:
data['start_dt'] = pd.to_datetime(data['start'].str.strip(),format='%b %Y')

data['end_dt'] = pd.to_datetime(data['End'].str.strip(),format='%b %Y')

data['months'] = (
    (data['end_dt'].dt.year - data['start_dt'].dt.year) * 12 +
    (data['end_dt'].dt.month - data['start_dt'].dt.month)
)


In [18]:
data.drop(['start_dt','end_dt'],axis=1,inplace=True)

In [19]:
data.head(2)

Unnamed: 0,Rank,Title,Score,Episode_Count,start,End,months
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1,64,Apr 2009,Jul 2010,15
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07,24,Apr 2011,Sep 2011,5


### Longest Running Show in Data

In [20]:
data[data['months']==data['months'].max()][['Title','Score']]

Unnamed: 0,Title,Score
11,"Ginga Eiyuu DensetsuOVA (110 eps)Jan 1988 - Mar 1997309,193 members",9.02


### Extracting Name + Type

In [21]:
def extract_name(title):
    name=""
    for i in range(len(title)):
        while(title[i]!='('):
            name+=title[i]
            i+=1
        break
    return name

In [22]:
data["Name"]=data['Title'].apply(extract_name)

In [23]:
data.head(3)

Unnamed: 0,Rank,Title,Score,Episode_Count,start,End,months,Name
0,1,"Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview",9.1,64,Apr 2009,Jul 2010,15,Fullmetal Alchemist: BrotherhoodTV
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members",9.07,24,Apr 2011,Sep 2011,5,Steins;GateTV
2,3,"Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members",9.06,13,Oct 2022,Dec 2022,2,Bleach: Sennen Kessen-henTV


In [24]:
print(data.loc[49]["Name"])

Rurouni Kenshin: Meiji Kenkaku Romantan - Tsuioku-henOVA 


In [25]:
data["Name"].astype(str)

0                              Fullmetal Alchemist: BrotherhoodTV 
1                                                   Steins;GateTV 
2                                     Bleach: Sennen Kessen-henTV 
3                                                      GintamaÂ°TV 
4                            Shingeki no Kyojin Season 3 Part 2TV 
5                                                      Gintama'TV 
6                                         Gintama: The FinalMovie 
7                                              Hunter x Hunter TV 
8                    Kaguya-sama wa Kokurasetai: Ultra RomanticTV 
9                                           Gintama': EnchousenTV 
10     Shingeki no Kyojin: The Final Season - Kanketsu-henSpecial 
11                                        Ginga Eiyuu DensetsuOVA 
12                   Bleach: Sennen Kessen-hen - Ketsubetsu-tanTV 
13                                     Fruits Basket: The FinalTV 
14                                                     Gintam

### Extracting Type

In [26]:
def extract_type(name):
    if name.endswith("TV "):
        return "TV"
    elif name.endswith("OVA "):
        return "OVA"
    elif name.endswith("Special "):
        return "Special"
    else:
        return "Movie"

In [27]:
data['Show_Type']=data['Name'].apply(extract_type)

In [28]:
data.head(5)['Title']

0       Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr 2009 - Jul 20103,218,472 membersManga StoreVolume 1â‚¬4.58Preview
1                                                           Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members
2                                               Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members
3                                                                GintamaÂ°TV (51 eps)Apr 2015 - Mar 2016605,113 members
4    Shingeki no Kyojin Season 3 Part 2TV (10 eps)Apr 2019 - Jul 20192,146,679 membersManga StoreVolume 1â‚¬10.99Preview
Name: Title, dtype: object

### Extracting Members

In [29]:
def extract_members(title):
    tem=""
    for i in range(len(title)):
        while(title[i]!=')'):
            i+=1
        i+=20
        while(title[i]!=' '):
            tem+=title[i]
            i+=1
        break
    mem=tem.replace(',','')
    return mem

In [30]:
data['Members']=data['Title'].apply(extract_members)

### Type Conversion String -> Integer

In [31]:
data['Members'] = pd.to_numeric(data['Members'], errors='coerce').astype('Int64')

In [32]:
data.drop('Title',axis=1,inplace=True)

In [33]:
def remove_clippings(name):
    if name.endswith("TV "):
        return name.replace('TV ','')
    elif name.endswith("OVA "):
        return name.replace('OVA ','')
    elif name.endswith("Special "):
        return name.replace('Special ','')
    else:
        return name.replace('Movie ','')

In [34]:
data['Name']=data['Name'].apply(remove_clippings)

### Reordering Features

In [35]:
data=data[
    [
        "Score",
        "Rank",
        "months",
        "Members",
        "Episode_Count",
        "Name",
        "start",
        "End",
        "Show_Type"
    ]
]

In [36]:
data.head(10)

Unnamed: 0,Score,Rank,months,Members,Episode_Count,Name,start,End,Show_Type
0,9.1,1,15,3218472,64,Fullmetal Alchemist: Brotherhood,Apr 2009,Jul 2010,TV
1,9.07,2,5,2473707,24,Steins;Gate,Apr 2011,Sep 2011,TV
2,9.06,3,2,474138,13,Bleach: Sennen Kessen-hen,Oct 2022,Dec 2022,TV
3,9.06,4,11,605113,51,GintamaÂ°,Apr 2015,Mar 2016,TV
4,9.05,5,3,2146679,10,Shingeki no Kyojin Season 3 Part 2,Apr 2019,Jul 2019,TV
5,9.04,6,11,534105,51,Gintama',Apr 2011,Mar 2012,TV
6,9.04,7,0,137208,1,Gintama: The Final,Jan 2021,Jan 2021,Movie
7,9.04,8,35,2701154,148,Hunter x Hunter,Oct 2011,Sep 2014,TV
8,9.04,9,2,851445,13,Kaguya-sama wa Kokurasetai: Ultra Romantic,Apr 2022,Jun 2022,TV
9,9.03,10,5,313446,13,Gintama': Enchousen,Oct 2012,Mar 2013,TV


<h1 align="center">***Machine Learning Ready Dataset***</h1>