📌 Pandas in Python

Pandas is a fast, powerful, and easy-to-use open-source library for data manipulation and analysis in Python.
It provides two main data structures:

Series → 1D labeled array (like an Excel column).

DataFrame → 2D labeled data structure (like an Excel table).

1. Installing Pandas =
pip install pandas
2. Importing Pandas =
import pandas as pd

In [4]:
#Pandas Series Example
#A Series is like a one-dimensional labeled array.

import pandas as pd
data = [7,14,21,28,35]
s = pd.Series(data, index=["a", "b", "c", "d","e"])

print(s)

a     7
b    14
c    21
d    28
e    35
dtype: int64


In [5]:
print(s["b"])

14


In [6]:
#Pandas DataFrame Example
#A DataFrame is a 2D table with rows and columns.

import pandas as pd

# DataFrame 
data = {
    "Name": ["Trupti", "Bobo", "Suzi","Jemmy"],
    "Age": [21,23,25,27],
    "City": ["London","New York", "Japan", "Paris"]
}
df = pd.DataFrame(data)
print(df)

     Name  Age      City
0  Trupti   21    London
1    Bobo   23  New York
2    Suzi   25     Japan
3   Jemmy   27     Paris


In [12]:
print(df["Name"])

0    Trupti
1      Bobo
2      Suzi
3     Jemmy
Name: Name, dtype: object


In [13]:
print(df["Age"])

0    21
1    23
2    25
3    27
Name: Age, dtype: int64


In [14]:
print(df["City"])

0      London
1    New York
2       Japan
3       Paris
Name: City, dtype: object


In [16]:
print(df.Name)  

0    Trupti
1      Bobo
2      Suzi
3     Jemmy
Name: Name, dtype: object


In [19]:
# Basic DataFrame Operations
print(df.head())       # First 5 rows
print(df.tail())       # Last 5 rows
print(df.info())       # Summary info
print(df.describe())   # Statistics summary
print(df.shape)        # (rows, columns)

     Name  Age      City
0  Trupti   21    London
1    Bobo   23  New York
2    Suzi   25     Japan
3   Jemmy   27     Paris
     Name  Age      City
0  Trupti   21    London
1    Bobo   23  New York
2    Suzi   25     Japan
3   Jemmy   27     Paris
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   City    4 non-null      object
dtypes: int64(1), object(2)
memory usage: 228.0+ bytes
None
             Age
count   4.000000
mean   24.000000
std     2.581989
min    21.000000
25%    22.500000
50%    24.000000
75%    25.500000
max    27.000000
(4, 3)


In [21]:
#Selecting Data
# Select column
print(df["Name"])

# Select multiple columns
print(df[["Name", "City"]])

# Select row by index
print(df.iloc[1])

# Select row by label
print(df.loc[0])



0    Trupti
1      Bobo
2      Suzi
3     Jemmy
Name: Name, dtype: object
     Name      City
0  Trupti    London
1    Bobo  New York
2    Suzi     Japan
3   Jemmy     Paris
Name        Bobo
Age           23
City    New York
Name: 1, dtype: object
Name    Trupti
Age         21
City    London
Name: 0, dtype: object


In [23]:
#Filtering Data
print(df[df["Age"] > 20])

     Name  Age      City
0  Trupti   21    London
1    Bobo   23  New York
2    Suzi   25     Japan
3   Jemmy   27     Paris


In [26]:
# Add new column
df["Salary"] = [90000, 60000, 70000,30000]


In [27]:
# Modify column
df["Age"] = df["Age"] + 1


In [28]:
print(df.Age)

0    22
1    24
2    26
3    28
Name: Age, dtype: int64


In [29]:
# Group by City and calculate mean age
print(df.groupby("City")["Age"].mean())


City
Japan       26.0
London      22.0
New York    24.0
Paris       28.0
Name: Age, dtype: float64


In [30]:
# Drop missing values
df.dropna(inplace=True)


In [31]:
# Fill missing values
df.fillna(0, inplace=True)

Data Analysis & Cleaning
Pandas is the go-to tool for:

    - Data wrangling (cleaning messy data)

    - Feature engineering in machine learning

    - Exploratory data analysis (EDA)

In [32]:
#Student Example
import pandas as pd
data = {
    "Name": ["Trupti", "Bobo", "Jemmy", "Suzi"],
    "Marks": [90, 62, 92, 70],
    "City": ["New York", "London", "Paris", "Berlin"]
}

df = pd.DataFrame(data)

print("Full DataFrame:")
print(df)

print("\nOnly Names:")
print(df["Name"])

print("\nStudents with Marks > 70:")
print(df[df["Marks"] > 70])

print("\nAverage Marks:", df["Marks"].mean())


Full DataFrame:
     Name  Marks      City
0  Trupti     90  New York
1    Bobo     62    London
2   Jemmy     92     Paris
3    Suzi     70    Berlin

Only Names:
0    Trupti
1      Bobo
2     Jemmy
3      Suzi
Name: Name, dtype: object

Students with Marks > 70:
     Name  Marks      City
0  Trupti     90  New York
2   Jemmy     92     Paris

Average Marks: 78.5
