# Table of Contents:
# 1. Introduction to Pandas
# 2. Creating Series and DataFrames
# 3. Loading and Exploring Data
# 4. Data Selection and Indexing
# 5. Data Cleaning and Handling Missing Values
# 6. Data Filtering and Querying
# 7. Data Aggregation and Grouping
# 8. Data Merging and Joining
# 9. Pivot Tables and Reshaping
# 10. Advanced Operations

In [3]:
import pandas as pd
import numpy as np
print("Pandas version:", pd.__version__)

Pandas version: 2.2.3


# ===========================================================
# 1. INTRODUCTION TO PANDAS
# ===========================================================


Pandas is a powerful data manipulation and analysis library for Python.
It provides two main data structures:
- Series: 1-dimensional labeled array
- DataFrame: 2-dimensional labeled data structure (like a spreadsheet)

Key Features:
- Easy handling of missing data
- Data alignment and merging
- Flexible reshaping and pivoting
- Powerful grouping functionality
- Time series functionality


# =================================================================
# 2. CREATING SERIES AND DATAFRAMES
# =================================================================

In [4]:
print("="*60)
print("2. CREATING SERIES AND DATAFRAMES")
print("="*60)

# Creating a Series
print("\n2.1 Creating Series:")
print("-" * 30)

# From a list
ages = pd.Series([25, 30, 35, 28, 45], name='Age')
print("Series from list:")
print(ages)

# From a dictionary
person_ages = pd.Series({'Yojjal': 25, 'Ram': 30, 'Shyam': 35, 'Hari': 28})
print("\nSeries from dictionary:")
print(person_ages)

# Series attributes
print(f"\nSeries shape: {ages.shape}")
print(f"Series dtype: {ages.dtype}")
print(f"Series index: {ages.index.tolist()}")
print(f"Series values: {ages.values}")

2. CREATING SERIES AND DATAFRAMES

2.1 Creating Series:
------------------------------
Series from list:
0    25
1    30
2    35
3    28
4    45
Name: Age, dtype: int64

Series from dictionary:
Yojjal    25
Ram       30
Shyam     35
Hari      28
dtype: int64

Series shape: (5,)
Series dtype: int64
Series index: [0, 1, 2, 3, 4]
Series values: [25 30 35 28 45]


In [5]:
# Creating a DataFrame
print("\n2.2 Creating DataFrames:")
print("-" * 30)

# From a dictionary
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [25, 30, 35, 28, 45],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'Salary': [50000, 60000, 75000, 55000, 80000]
}

df_sample = pd.DataFrame(data_dict)
print("DataFrame from dictionary:")
print(df_sample)

# From lists
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
df_from_lists = pd.DataFrame({'Name': names, 'Age': ages})
print("\nDataFrame from lists:")
print(df_from_lists)

# DataFrame attributes
print(f"\nDataFrame shape: {df_sample.shape}")
print(f"DataFrame columns: {df_sample.columns.tolist()}")
print(f"DataFrame index: {df_sample.index.tolist()}")
print(f"DataFrame dtypes:\n{df_sample.dtypes}")


2.2 Creating DataFrames:
------------------------------
DataFrame from dictionary:
      Name  Age      City  Salary
0    Alice   25  New York   50000
1      Bob   30    London   60000
2  Charlie   35     Paris   75000
3    Diana   28     Tokyo   55000
4      Eve   45    Sydney   80000

DataFrame from lists:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

DataFrame shape: (5, 4)
DataFrame columns: ['Name', 'Age', 'City', 'Salary']
DataFrame index: [0, 1, 2, 3, 4]
DataFrame dtypes:
Name      object
Age        int64
City      object
Salary     int64
dtype: object


# =======================================
# 3. LOADING AND EXPLORING DATA
# =======================================



In [8]:
# Load the Titanic dataset
Anime=pd.read_csv('Anime.csv')

In [9]:
Anime

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.60,2021.0,,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.0,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.60,2021.0,,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.0,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.0,,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.0,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.0,2010.0,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.0,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.0,,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18490,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,,Web,2.0,Sparkly Key Animation Studio,,"Action, Ancient China, Chinese Animation, Hist...",,2020.0,,Special episodes of Qin Shi Mingyue: Canghai H...,,,Qin Shi Mingyue: Canghai Hengliu,,
18491,18492,Yi Tang Juchang: Sanguo Yanyi,,TV,108.0,,,Chinese Animation,,2010.0,,No synopsis yet - check back soon!,,,,,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,,TV,13.0,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,
18493,18494,Chengshi Jiyi Wo Men de Jieri,,TV,,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,


In [10]:
# Basic information about the dataset
print(f"Dataset shape: {Anime.shape}")
print(f"Number of rows: {len(Anime)}")
print(f"Number of columns: {len(Anime.columns)}")

Dataset shape: (18495, 17)
Number of rows: 18495
Number of columns: 17


In [11]:
# Display first few rows
print("\nFirst 5 rows:")
Anime.head()


First 5 rows:


Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.6,2021.0,,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.0,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.6,2021.0,,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.0,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.0,,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.0,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.0,2010.0,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.0,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.0,,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."


In [12]:
# Display last few rows
print("\nLast 5 rows:")
Anime.tail()



Last 5 rows:


Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
18490,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,,Web,2.0,Sparkly Key Animation Studio,,"Action, Ancient China, Chinese Animation, Hist...",,2020.0,,Special episodes of Qin Shi Mingyue: Canghai H...,,,Qin Shi Mingyue: Canghai Hengliu,,
18491,18492,Yi Tang Juchang: Sanguo Yanyi,,TV,108.0,,,Chinese Animation,,2010.0,,No synopsis yet - check back soon!,,,,,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,,TV,13.0,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,
18493,18494,Chengshi Jiyi Wo Men de Jieri,,TV,,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,
18494,18495,Heisei Inu Monogatari Bow: Genshi Inu Monogata...,,Movie,,Nippon Animation,,"Comedy, Slice of Life, Dogs",,1994.0,,No synopsis yet - check back soon!,,,Heisei Inu Monogatari Bow,,


In [13]:
# Display random sample
print("\nRandom sample of 3 rows:")
Anime.sample(3)



Random sample of 3 rows:


Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
6658,6659,Buzzer Beater 2nd Season,,TV,13.0,TMS Entertainment,Summer,"Sci Fi, Shounen, Sports, Aliens, Basketball, B...",3.41,2007.0,,"""After being drafted into the Earth Team - a b...",,Buzzer Beater,Buzzer Beater,"Cha-Che : Sanae Kobayashi, Dt : Yuuji Ueda, Ha...","Takehiko Inoue : Original Creator, Shigeyuki M..."
1419,1420,Tokyo Ravens,,TV,24.0,8-Bit,Fall,"Action, Magic, Magic School, School Life, Supe...",3.88,2013.0,2014.0,"""As a descendant of the famous Tsuchimikado sh...",,"Tokyo Ravens, Tokyo Ravens (Light Novel), Toky...",Tokyo Ravens Specials,"Harutora Tsuchimikado : Kaito Ishikawa, Natsum...","Takaomi Kanasaki : Director, Maiko Iuchi : Mus..."
15376,15377,ROXY x Masanobu Featuring Stephanie Gilmore,,Web,1.0,,,"No Dialogue, Promotional, Shorts",,2018.0,,No synopsis yet - check back soon!,,,,"Stephanie Gilmore, Masanobu Hiraoka\nDirector",Masanobu Hiraoka : Director


In [14]:
# Dataset info
print(Anime.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18495 entries, 0 to 18494
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Rank             18495 non-null  int64  
 1   Name             18495 non-null  object 
 2   Japanese_name    7938 non-null   object 
 3   Type             18495 non-null  object 
 4   Episodes         9501 non-null   float64
 5   Studio           12018 non-null  object 
 6   Release_season   4116 non-null   object 
 7   Tags             18095 non-null  object 
 8   Rating           15364 non-null  float64
 9   Release_year     18112 non-null  float64
 10  End_year         2854 non-null   float64
 11  Description      18491 non-null  object 
 13  Related_Mange    7627 non-null   object 
 14  Related_anime    10063 non-null  object 
 15  Voice_actors     15309 non-null  object 
 16  staff            13005 non-null  object 
dtypes: float64(4), int64(1), object(12)
memory usage: 2.4+ MB


In [15]:
# Describe numerical columns
Anime.describe()

Unnamed: 0,Rank,Episodes,Rating,Release_year,End_year
count,18495.0,9501.0,15364.0,18112.0,2854.0
mean,9248.0,20.92085,3.355133,2006.520318,2004.256132
std,5339.19095,37.990858,0.400624,15.189537,13.257484
min,1.0,1.0,0.96,1907.0,1962.0
25%,4624.5,2.0,3.13,2001.0,1996.0
50%,9248.0,12.0,3.36,2012.0,2007.0
75%,13871.5,26.0,3.59,2017.0,2015.0
max,18495.0,800.0,4.6,2023.0,2022.0


In [16]:
# Check for missing values
print(Anime.isnull().sum())

Rank                   0
Name                   0
Japanese_name      10557
Type                   0
Episodes            8994
Studio              6477
Release_season     14379
Tags                 400
Rating              3131
Release_year         383
End_year           15641
Description            4
Related_Mange      10868
Related_anime       8432
Voice_actors        3186
staff               5490
dtype: int64


In [17]:
# Check data types
print(Anime.dtypes)

Rank                 int64
Name                object
Japanese_name       object
Type                object
Episodes           float64
Studio              object
Release_season      object
Tags                object
Rating             float64
Release_year       float64
End_year           float64
Description         object
Related_Mange       object
Related_anime       object
Voice_actors        object
staff               object
dtype: object


# =====================================================
# 4. DATA SELECTION AND INDEXING
# =====================================================


In [21]:
print("4. DATA SELECTION AND INDEXING")
print("="*60)

# Column selection
print("\n4.1 Column Selection:")
print("-" * 30)

# Single column (returns Series)
Rank_series = Anime['Rank']
print(f"Type of single column selection: {type(Rank_series)}")
print(f"First 5 Rank: {Rank_series.head().tolist()}")

4. DATA SELECTION AND INDEXING

4.1 Column Selection:
------------------------------
Type of single column selection: <class 'pandas.core.series.Series'>
First 5 Rank: [1, 2, 3, 4, 5]


In [None]:
# Multiple columns (returns DataFrame)
Anime_info = Anime[['Rank', 'Name', 'Type']]
print(f"\nType of multiple column selection: {type(Anime_info)}")
print("First 3 rows of Anime info:")
print(Anime_info.head(3))



Type of multiple column selection: <class 'pandas.core.frame.DataFrame'>
First 3 rows of passenger info:
   Rank                                               Name   Type
0     1  Demon Slayer: Kimetsu no Yaiba - Entertainment...  TV   
1     2                     Fruits Basket the Final Season  TV   
2     3                                    Mo Dao Zu Shi 3  Web  


In [25]:
# Row selection using iloc (integer-location based)
print("\n4.2 Row Selection with iloc:")
print("-" * 30)

# Single row
first_Anime =Anime.iloc[0]
print(f"First Anime:\n{first_Anime}")
print(f"\nType of row selection: {type(first_Anime)}")



4.2 Row Selection with iloc:
------------------------------
First Anime:
Rank                                                               1
Name               Demon Slayer: Kimetsu no Yaiba - Entertainment...
Japanese_name                          Kimetsu no Yaiba: Yuukaku-hen
Type                                                           TV   
Episodes                                                         NaN
Studio                                                      ufotable
Release_season                                                 Fall 
Tags               Action, Adventure, Fantasy, Shounen, Demons, H...
Rating                                                           4.6
Release_year                                                  2021.0
End_year                                                         NaN
Description        'Tanjiro and his friends accompany the Hashira...
Related_Mange                         Demon Slayer: Kimetsu no Yaiba
Related_anime      Demon Slay

In [28]:
# Multiple rows
first_five = Anime.iloc[0:5]
print(f"\nFirst 5 Anime (shape: {first_five.shape}):")
print(first_five)
print(first_five[['Rank', 'Name', 'Type']])


First 5 Anime (shape: (5, 17)):
   Rank                                               Name  \
0     1  Demon Slayer: Kimetsu no Yaiba - Entertainment...   
1     2                     Fruits Basket the Final Season   
2     3                                    Mo Dao Zu Shi 3   
3     4                   Fullmetal Alchemist: Brotherhood   
4     5                Attack on Titan 3rd Season: Part II   

                                     Japanese_name   Type  Episodes  \
0                    Kimetsu no Yaiba: Yuukaku-hen  TV          NaN   
1                          Fruits Basket the Final  TV         13.0   
2                       The Founder of Diabolism 3  Web        12.0   
3   Hagane no Renkinjutsushi: Full Metal Alchemist  TV         64.0   
4             Shingeki no Kyojin Season 3: Part II  TV         10.0   

              Studio Release_season  \
0           ufotable          Fall    
1  TMS Entertainment         Spring   
2   B.C MAY PICTURES            NaN   
3          

In [29]:
# Specific rows
specific_rows = Anime.iloc[[0, 5, 10]]
print(f"\nSpecific rows (0, 5, 10):")
print(specific_rows[['Rank', 'Name', 'Episodes']])

# Row and column selection
print("\n4.3 Row and Column Selection:")
print("-" * 30)

# Select specific rows and columns
subset = Anime.iloc[0:5, 1:4]  # First 5 rows, columns 1-3
print("Subset (first 5 rows, columns 1-3):")
print(subset)


Specific rows (0, 5, 10):
    Rank                                               Name  Episodes
0      1  Demon Slayer: Kimetsu no Yaiba - Entertainment...       NaN
5      6                                     Jujutsu Kaisen      24.0
10    11                                         your name.       NaN

4.3 Row and Column Selection:
------------------------------
Subset (first 5 rows, columns 1-3):
                                                Name  \
0  Demon Slayer: Kimetsu no Yaiba - Entertainment...   
1                     Fruits Basket the Final Season   
2                                    Mo Dao Zu Shi 3   
3                   Fullmetal Alchemist: Brotherhood   
4                Attack on Titan 3rd Season: Part II   

                                     Japanese_name   Type  
0                    Kimetsu no Yaiba: Yuukaku-hen  TV     
1                          Fruits Basket the Final  TV     
2                       The Founder of Diabolism 3  Web    
3   Hagane no Renk

In [30]:
# Using column names with loc
Name_tags_subset = Anime.loc[0:4, ['Name', 'Tags']]
print(f"\nUsing loc with column names:")
print(Name_tags_subset)


Using loc with column names:
                                                Name  \
0  Demon Slayer: Kimetsu no Yaiba - Entertainment...   
1                     Fruits Basket the Final Season   
2                                    Mo Dao Zu Shi 3   
3                   Fullmetal Alchemist: Brotherhood   
4                Attack on Titan 3rd Season: Part II   

                                                Tags  
0  Action, Adventure, Fantasy, Shounen, Demons, H...  
1  Drama, Fantasy, Romance, Shoujo, Animal Transf...  
2  Fantasy, Ancient China, Chinese Animation, Cul...  
3  Action, Adventure, Drama, Fantasy, Mystery, Sh...  
4  Action, Fantasy, Horror, Shounen, Dark Fantasy...  


# =====================================================
# 5. DATA CLEANING AND HANDLING MISSING VALUES
# =====================================================

In [31]:
print("\n" + "="*60)
print("5. DATA CLEANING AND HANDLING MISSING VALUES")
print("="*60)

print("\n5.1 Identifying Missing Values:")
print("-" * 30)

# Check for missing values
missing_counts = Anime.isnull().sum()
print("Missing values per column:")
print(missing_counts[missing_counts > 0])


5. DATA CLEANING AND HANDLING MISSING VALUES

5.1 Identifying Missing Values:
------------------------------
Missing values per column:
Japanese_name      10557
Episodes            8994
Studio              6477
Release_season     14379
Tags                 400
Rating              3131
Release_year         383
End_year           15641
Description            4
Related_Mange      10868
Related_anime       8432
Voice_actors        3186
staff               5490
dtype: int64


In [32]:
# Percentage of missing values
missing_percentage = (Anime.isnull().sum() / len(Anime)) * 100
print("\nPercentage of missing values:")
print(missing_percentage[missing_percentage > 0])


Percentage of missing values:
Japanese_name      57.080292
Episodes           48.629359
Studio             35.020276
Release_season     77.745337
Tags                2.162747
Rating             16.928900
Release_year        2.070830
End_year           84.568802
Description         0.021627
Related_Mange      58.761828
Related_anime      45.590700
Voice_actors       17.226277
staff              29.683698
dtype: float64


In [33]:
print("\n5.2 Handling Missing Values:")
print("-" * 30)

# Create a copy for cleaning
Anime_clean = Anime.copy()

# Method 1: Drop rows with missing values
print(f"Original shape: {Anime_clean.shape}")
Anime_no_na = Anime_clean.dropna()
print(f"After dropping all NAs: {Anime_no_na.shape}")


5.2 Handling Missing Values:
------------------------------
Original shape: (18495, 17)
After dropping all NAs: (40, 17)


In [42]:
# Method 2: Drop specific columns with missing values
Anime_drop_cols = Anime_clean.dropna(axis=1)
print(f"After dropping columns with NAs: {Anime_drop_cols.shape}")

After dropping columns with NAs: (18495, 5)


In [41]:
# Fill categorical missing values with mode
if 'staff' in Anime_clean.columns:
    mode_staff = Anime_clean['staff'].mode()[0]
    Anime_clean['staff'].fillna(mode_staff, inplace=True)
    print(f"Filled missing staff with mode: {mode_staff}")


Filled missing staff with mode: Pinocchio-P : Producer


In [44]:
# Fill categorical missing values with mode
if 'Embarked' in Anime_clean.columns:
    mode_embarked = Anime_clean['Embarked'].mode()[0]
    Anime_clean['Embarked'].fillna(mode_embarked, inplace=True)
    print(f"Filled missing embarked with mode: {mode_embarked}")


In [45]:
# Check missing values after cleaning
print(f"\nMissing values after cleaning: {Anime_clean.isnull().sum()}")


Missing values after cleaning: Rank                   0
Name                   0
Japanese_name      10557
Type                   0
Episodes               0
Studio              6477
Release_season     14379
Tags                 400
Rating              3131
Release_year         383
End_year           15641
Description            4
Related_Mange      10868
Related_anime       8432
Voice_actors        3186
staff                  0
dtype: int64


# =====================================================
# 6. DATA FILTERING AND QUERYING
# =====================================================


In [60]:
print("6. DATA FILTERING AND QUERYING")

print("\n6.1 Basic Filtering:")

# Single condition
Rating = Anime_clean[Anime_clean['Rating'] >= 10]
print(f"Number of Ranking (Ranking >= 500): {len(Rating)}")

6. DATA FILTERING AND QUERYING

6.1 Basic Filtering:
Number of Ranking (Ranking >= 500): 0


In [61]:
# Display first 
print("First 3 Animes:")
print(Anime[['Rating', 'Name', 'staff']].head(3))


First 3 Animes:
   Rating                                               Name  \
0    4.60  Demon Slayer: Kimetsu no Yaiba - Entertainment...   
1    4.60                     Fruits Basket the Final Season   
2    4.58                                    Mo Dao Zu Shi 3   

                                               staff  
0  Koyoharu Gotouge : Original Creator, Haruo Sot...  
1  Natsuki Takaya : Original Creator, Yoshihide I...  
2  Mo Xiang Tong Xiu : Original Creator, Xiong Ke...  


In [65]:
print("\n6.2 Using isin() Method:")
print("-" * 30)

# Filter using isin()
Season = Anime[Anime['Release_season'].isin(['Fall', 'Spring'])]
print(f"Fall and spring season: {len(Anime)}")


6.2 Using isin() Method:
------------------------------
Fall and spring season: 18495


In [67]:
# =============================================================================
# 7. DATA AGGREGATION AND GROUPING
# =============================================================================

print("\n" + "="*60)
print("7. DATA AGGREGATION AND GROUPING")
print("="*60)

print("\n7.1 Basic Aggregations:")
print("-" * 30)

# Basic statistics
if 'Age' in Anime_clean.columns:
    print(f"Mean Rating: {Anime_clean['Rating'].mean():.1f}")
    print(f"Median Rating: {Anime_clean['Rating'].median():.1f}")
    print(f"Standard deviation of Rating: {Anime_clean['Rating'].std():.1f}")

if 'Episodes' in Anime_clean.columns:
    print(f"Average Episodes: {Anime_clean['Episodes'].mean():.2f}")
    print(f"Maximum Episodes: {Anime_clean['Episodes'].max():.2f}")


7. DATA AGGREGATION AND GROUPING

7.1 Basic Aggregations:
------------------------------
Average Episodes: 16.58
Maximum Episodes: 800.00


In [71]:
# Multiple aggregations
print("\n7.4 Multiple Aggregation Functions:")
print("-" * 30)

if 'Episodes' in Anime_clean.columns and 'Release_year' in Anime_clean.columns:
    Episodes_stats = Anime_clean.groupby('Release_year')['Episodes'].agg(['mean', 'median', 'std', 'count'])
    print("Episode statistics by Year:")
    print(Episodes_stats)


7.4 Multiple Aggregation Functions:
------------------------------
Episode statistics by Year:
                   mean  median        std  count
Release_year                                     
1907.0        12.000000    12.0        NaN      1
1917.0        12.000000    12.0   0.000000     13
1918.0        12.000000    12.0   0.000000      6
1924.0        12.000000    12.0   0.000000      2
1925.0        12.000000    12.0   0.000000      4
...                 ...     ...        ...    ...
2019.0        14.572536    12.0  12.745548    903
2020.0        14.346893    12.0  13.169814    885
2021.0        13.245509    12.0  10.495898    835
2022.0        11.863071    12.0   1.222159    241
2023.0        12.000000    12.0   0.000000      4

[103 rows x 4 columns]


In [37]:
# =============================================================================
# 8. DATA MERGING AND JOINING
# =============================================================================

print("\n" + "="*60)
print("8. DATA MERGING AND JOINING")
print("="*60)

# Create sample datasets for merging
print("\n8.1 Creating Sample Data for Merging:")
print("-" * 30)

# Passenger details
passenger_details = pd.DataFrame({
    'passenger_id': [1, 2, 3, 4, 5],
    'name': ['Alice Johnson', 'Bob Smith', 'Charlie Brown', 'Diana Prince', 'Eve Wilson'],
    'ticket_class': [1, 2, 3, 1, 2]
})

# Ticket information
ticket_info = pd.DataFrame({
    'passenger_id': [1, 2, 3, 6, 7],
    'ticket_number': ['A123', 'B456', 'C789', 'D012', 'E345'],
    'fare_paid': [100, 75, 50, 120, 80]
})

print("Passenger Details:")
print(passenger_details)
print("\nTicket Information:")
print(ticket_info)


8. DATA MERGING AND JOINING

8.1 Creating Sample Data for Merging:
------------------------------
Passenger Details:
   passenger_id           name  ticket_class
0             1  Alice Johnson             1
1             2      Bob Smith             2
2             3  Charlie Brown             3
3             4   Diana Prince             1
4             5     Eve Wilson             2

Ticket Information:
   passenger_id ticket_number  fare_paid
0             1          A123        100
1             2          B456         75
2             3          C789         50
3             6          D012        120
4             7          E345         80


In [38]:
print("\n8.2 Different Types of Merges:")
print("-" * 30)

# Inner join (default)
inner_merge = pd.merge(passenger_details, ticket_info, on='passenger_id')
print("Inner Join (only matching records):")
print(inner_merge)

# Left join
left_merge = pd.merge(passenger_details, ticket_info, on='passenger_id', how='left')
print("\nLeft Join (all records from left table):")
print(left_merge)

# Right join
right_merge = pd.merge(passenger_details, ticket_info, on='passenger_id', how='right')
print("\nRight Join (all records from right table):")
print(right_merge)

# Outer join
outer_merge = pd.merge(passenger_details, ticket_info, on='passenger_id', how='outer')
print("\nOuter Join (all records from both tables):")
print(outer_merge)


8.2 Different Types of Merges:
------------------------------
Inner Join (only matching records):
   passenger_id           name  ticket_class ticket_number  fare_paid
0             1  Alice Johnson             1          A123        100
1             2      Bob Smith             2          B456         75
2             3  Charlie Brown             3          C789         50

Left Join (all records from left table):
   passenger_id           name  ticket_class ticket_number  fare_paid
0             1  Alice Johnson             1          A123      100.0
1             2      Bob Smith             2          B456       75.0
2             3  Charlie Brown             3          C789       50.0
3             4   Diana Prince             1           NaN        NaN
4             5     Eve Wilson             2           NaN        NaN

Right Join (all records from right table):
   passenger_id           name  ticket_class ticket_number  fare_paid
0             1  Alice Johnson           1.0 

In [39]:
print("\n8.3 Merge with Different Column Names:")
print("-" * 30)

# Create data with different column names
passenger_info = pd.DataFrame({
    'id': [1, 2, 3],
    'passenger_name': ['Alice', 'Bob', 'Charlie']
})

booking_info = pd.DataFrame({
    'passenger_id': [1, 2, 4],
    'booking_date': ['2024-01-01', '2024-01-02', '2024-01-03']
})

# Merge with different column names
merge_diff_names = pd.merge(passenger_info, booking_info,
                          left_on='id', right_on='passenger_id', how='inner')
print("Merge with different column names:")
print(merge_diff_names)


8.3 Merge with Different Column Names:
------------------------------
Merge with different column names:
   id passenger_name  passenger_id booking_date
0   1          Alice             1   2024-01-01
1   2            Bob             2   2024-01-02


In [40]:
print("\n8.4 Concatenation:")
print("-" * 30)

# Concatenate DataFrames vertically
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

concat_vertical = pd.concat([df1, df2], ignore_index=True)
print("Vertical concatenation:")
print(concat_vertical)

# Concatenate horizontally
concat_horizontal = pd.concat([df1, df2], axis=1)
print("\nHorizontal concatenation:")
print(concat_horizontal)


8.4 Concatenation:
------------------------------
Vertical concatenation:
   A  B
0  1  3
1  2  4
2  5  7
3  6  8

Horizontal concatenation:
   A  B  A  B
0  1  3  5  7
1  2  4  6  8


In [73]:
# =============================================================================
# 9. PIVOT TABLES AND RESHAPING
# =============================================================================

print("\n" + "="*60)
print("9. PIVOT TABLES AND RESHAPING")
print("="*60)

print("\n9.1 Pivot Tables:")
print("-" * 30)

# Create a pivot table: average Rating by Type and Release_season
if 'Type' in Anime_drop_cols.columns and 'Release_season' in Anime_drop_cols.columns and 'Rating' in Anime_drop_cols.columns:
    pivot_rating = pd.pivot_table(
        Anime_drop_cols,
        values='Rating',
        index='Type',
        columns='Release_season',
        aggfunc='mean',
        fill_value=0
    )
    print("Average Rating by Type and Release Season (Pivot Table):")
    print(pivot_rating)



9. PIVOT TABLES AND RESHAPING

9.1 Pivot Tables:
------------------------------


In [74]:
# Pivot table with multiple values using Anime_drop_cols
if 'Type' in Anime_drop_cols.columns and 'Episodes' in Anime_drop_cols.columns and 'Rating' in Anime_drop_cols.columns:
    pivot_multiple = pd.pivot_table(
        Anime_drop_cols,
        values=['Episodes', 'Rating'],
        index='Type',
        aggfunc='mean',
        fill_value=0
    )
    print("\nAverage Episodes and Rating by Anime Type:")
    print(pivot_multiple)


In [75]:
# =============================================================================
# 10. ADVANCED OPERATIONS
# =============================================================================

print("\n" + "="*60)
print("10. ADVANCED OPERATIONS")
print("="*60)

print("\n10.1 Index Operations:")
print("-" * 30)

# Set index using Anime_drop_cols DataFrame
if 'Name' in Anime_drop_cols.columns:
    df_with_index = Anime_drop_cols.copy()
    df_with_index = df_with_index.set_index('Name')
    print(f"Shape after setting 'Name' as index: {df_with_index.shape}")
    print("First few rows with 'Name' as index:")
    print(df_with_index.head(3))

    # Reset index
    df_reset = df_with_index.reset_index()
    print(f"\nShape after resetting index: {df_reset.shape}")


10. ADVANCED OPERATIONS

10.1 Index Operations:
------------------------------
Shape after setting 'Name' as index: (18495, 4)
First few rows with 'Name' as index:
                                                    Rank   Type  Episodes  \
Name                                                                        
Demon Slayer: Kimetsu no Yaiba - Entertainment ...     1  TV         12.0   
Fruits Basket the Final Season                         2  TV         13.0   
Mo Dao Zu Shi 3                                        3  Web        12.0   

                                                                                                staff  
Name                                                                                                   
Demon Slayer: Kimetsu no Yaiba - Entertainment ...  Koyoharu Gotouge : Original Creator, Haruo Sot...  
Fruits Basket the Final Season                      Natsuki Takaya : Original Creator, Yoshihide I...  
Mo Dao Zu Shi 3                  

In [77]:
print("\n10.2 Apply Lambda Functions:")
print("-" * 30)

# Apply function to create new columns
if 'Episodes' in Anime_no_na.columns:
    def episode_group(x):
        if x < 13:
            return 'Short'
        elif x < 26:
            return 'Medium'
        else:
            return 'Long'
    Anime_no_na.loc[:, 'Episodes_group'] = Anime_no_na['Episodes'].apply(episode_group)
    print("Episodes group distribution:")
    print(Anime_no_na['Episodes_group'].value_counts())

# Apply function to multiple columns
def categorize_type(row):
    if row['Type'] == 'TV':
        return 'Television'
    elif row['Type'] == 'Movie':
        return 'Movie'
    else:
        return 'Other'

Anime_no_na.loc[:, 'type_category'] = Anime_no_na.apply(categorize_type, axis=1)
print("\nType category distribution:")
print(Anime_no_na['type_category'].value_counts())


10.2 Apply Lambda Functions:
------------------------------
Episodes group distribution:
Episodes_group
Medium    20
Long      18
Short      2
Name: count, dtype: int64

Type category distribution:
type_category
Other    40
Name: count, dtype: int64


In [45]:
print("\n10.3 Working with Dates:")
print("-" * 30)

# Create sample date data
date_range = pd.date_range(start='2024-01-01', end='2024-01-10', freq='D')
date_df = pd.DataFrame({
    'date': date_range,
    'value': np.random.randn(len(date_range))
})

print("Sample date data:")
print(date_df.head())

# Extract date components
date_df['year'] = date_df['date'].dt.year
date_df['month'] = date_df['date'].dt.month
date_df['dayofweek'] = date_df['date'].dt.dayofweek
date_df['weekday_name'] = date_df['date'].dt.day_name()

print("\nDate with extracted components:")
print(date_df[['date', 'year', 'month', 'dayofweek', 'weekday_name']].head())


10.3 Working with Dates:
------------------------------
Sample date data:
        date     value
0 2024-01-01 -1.180671
1 2024-01-02 -0.130740
2 2024-01-03  0.998933
3 2024-01-04  1.437842
4 2024-01-05  1.034724

Date with extracted components:
        date  year  month  dayofweek weekday_name
0 2024-01-01  2024      1          0       Monday
1 2024-01-02  2024      1          1      Tuesday
2 2024-01-03  2024      1          2    Wednesday
3 2024-01-04  2024      1          3     Thursday
4 2024-01-05  2024      1          4       Friday


In [78]:
print("\n10.4 Data Validation and Quality Checks:")
print("-" * 30)

# Check for duplicates
duplicates = Anime_no_na.duplicated().sum()
print(f"Number of duplicate rows: {duplicates}")

# Check data ranges
if 'Episodes' in Anime_no_na.columns:
    invalid_episodes = Anime_no_na[(Anime_no_na['Episodes'] < 0) | (Anime_no_na['Episodes'] > 1000)]
    print(f"Invalid episodes (< 0 or > 1000): {len(invalid_episodes)}")

# Memory usage
print(f"\nDataset memory usage: {Anime_no_na.memory_usage(deep=True).sum() / 1024:.2f} KB")


10.4 Data Validation and Quality Checks:
------------------------------
Number of duplicate rows: 0
Invalid episodes (< 0 or > 1000): 0

Dataset memory usage: 121.45 KB


In [80]:
print("\n" + "="*60)
print("SUMMARY AND BEST PRACTICES")
print("="*60)

print("""
Key Pandas Concepts Covered:
1. Series and DataFrame creation and manipulation
2. Data loading and exploration
3. Data selection and indexing (iloc, loc)
4. Missing value handling (dropna, fillna)
5. Data filtering and querying (isin, query)
6. Aggregation and grouping operations
7. Data merging and joining
8. Pivot tables and data reshaping
9. Advanced operations (apply, dates)

Next Steps:
- Practice with different datasets
""")

# Final dataset summary
print(f"\nFinal cleaned dataset shape: {Anime_clean.shape}")
print(f"Columns: {Anime_clean.columns.tolist()}")
print("\nData cleaning complete! Dataset ready for analysis.")


SUMMARY AND BEST PRACTICES

Key Pandas Concepts Covered:
1. Series and DataFrame creation and manipulation
2. Data loading and exploration
3. Data selection and indexing (iloc, loc)
4. Missing value handling (dropna, fillna)
5. Data filtering and querying (isin, query)
6. Aggregation and grouping operations
7. Data merging and joining
8. Pivot tables and data reshaping
9. Advanced operations (apply, dates)

Next Steps:
- Practice with different datasets


Final cleaned dataset shape: (18495, 17)

Data cleaning complete! Dataset ready for analysis.
