Project Name : Anime Feature Extraction Demonstration

Description:
            In this project we are given a data set with details of various animes and we had to implement feature extraction to perform following tasks:
            
1) make a new column for episode count
2) make a new column for time stamp
3) which anime has the highest score
4) give me top 5 highest scoring anime
5) which anime has the highest episode count
6) animes with top 5 episode count
7) which is the longest running anime

In [60]:
import numpy as np
import pandas as pd

In [61]:
df = pd.read_csv(r'anime.csv')

In [62]:
df.head()

Unnamed: 0,Rank,Title,Score
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05


In [63]:
df.loc[1]['Title']

'Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members'

1) Making a new column for episode count

In [64]:
def episode_count(txt):
    check = False
    ep_count = ""
    for i in txt:
        if i ==")":
            check = False
            break
        if i == "(":
            check = True
            continue
        if check == True:
            ep_count += i
    ep_count = ep_count.strip(" eps")
    return int(ep_count)

In [65]:
epcount = df['Title'].apply(episode_count)

In [66]:
df['Episodes'] = df['Title'].apply(episode_count)

In [67]:
df.head()

Unnamed: 0,Rank,Title,Score,Episodes
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10


In [68]:
df['Episodes'].dtype

dtype('int64')

In [69]:
df.loc[1]['Title']

'Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473,707 members'

2) Make a new column for time stamp

In [70]:
def time_stamp(txt):
    check = False
    data = ""
    for i in txt:
        if i == ")":
            check = True
            continue
        if check == True:
            data += i
            if len(data) == 19:
                check = False
                break
            continue
    return data

In [71]:
data = df['Title'].apply(time_stamp)

In [72]:
df['Time Stamp'] = data
df.head()

Unnamed: 0,Rank,Title,Score,Episodes,Time Stamp
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64,Apr 2009 - Jul 2010
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24,Apr 2011 - Sep 2011
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13,Oct 2022 - Dec 2022
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51,Apr 2015 - Mar 2016
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10,Apr 2019 - Jul 2019


3) Which anime has highest score

In [73]:
# df[df['Score'] == df['Score'].max()]['Title'] OR
df.head(1) # Because given data is already sorted

Unnamed: 0,Rank,Title,Score,Episodes,Time Stamp
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64,Apr 2009 - Jul 2010


4) Top five highest scoring anime

In [74]:
top_five_scoring_anime = df.sort_values(by='Score', ascending= False).head()

In [75]:
top_five_scoring_anime

Unnamed: 0,Rank,Title,Score,Episodes,Time Stamp
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64,Apr 2009 - Jul 2010
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24,Apr 2011 - Sep 2011
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13,Oct 2022 - Dec 2022
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51,Apr 2015 - Mar 2016
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10,Apr 2019 - Jul 2019


5) anime with highest episode count

In [76]:
highest_episode_count = df.loc[df['Episodes'].idxmax()]

In [77]:
highest_episode_count

Rank                                                         16
Title         GintamaTV (201 eps)Apr 2006 - Mar 20101,034,41...
Score                                                      8.94
Episodes                                                    201
Time Stamp                                  Apr 2006 - Mar 2010
Name: 15, dtype: object

In [78]:
from dateutil.relativedelta import relativedelta
from datetime import datetime
def calculate_total_months(period):
    try:
        start_str, end_str = period.split(' - ')
        start_date = datetime.strptime(start_str.strip(), '%b %Y')
        end_date = datetime.strptime(end_str.strip(), '%b %Y')
        r = relativedelta(end_date, start_date)
        return r.years * 12 + r.months + 1 
    except:
        return None

In [79]:
df['Months'] = df['Time Stamp'].apply(calculate_total_months)

In [80]:
df.head()

Unnamed: 0,Rank,Title,Score,Episodes,Time Stamp,Months
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64,Apr 2009 - Jul 2010,16
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24,Apr 2011 - Sep 2011,6
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13,Oct 2022 - Dec 2022,3
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51,Apr 2015 - Mar 2016,12
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10,Apr 2019 - Jul 2019,4


6) animes with top 5 episode count

In [81]:
df['Months'].dtype

dtype('int64')

In [82]:
top_five_episode_count = df.sort_values(by='Episodes', ascending=False).head()

In [83]:
top_five_episode_count

Unnamed: 0,Rank,Title,Score,Episodes,Time Stamp,Months
15,16,"GintamaTV (201 eps)Apr 2006 - Mar 20101,034,41...",8.94,201,Apr 2006 - Mar 2010,48
7,8,Hunter x Hunter TV (148 eps)Oct 2011 - Sep 201...,9.04,148,Oct 2011 - Sep 2014,36
11,12,Ginga Eiyuu DensetsuOVA (110 eps)Jan 1988 - Ma...,9.02,110,Jan 1988 - Mar 1997,111
42,43,Hajime no IppoTV (75 eps)Oct 2000 - Mar 200255...,8.76,75,Oct 2000 - Mar 2002,18
24,25,"MonsterTV (74 eps)Apr 2004 - Sep 20051,041,081...",8.87,74,Apr 2004 - Sep 2005,18


7) Longest running anime

In [85]:
df.loc[df['Months'].idxmax()]

Rank                                                         12
Title         Ginga Eiyuu DensetsuOVA (110 eps)Jan 1988 - Ma...
Score                                                      9.02
Episodes                                                    110
Time Stamp                                  Jan 1988 - Mar 1997
Months                                                      111
Name: 11, dtype: object