Week 1 – Day 3: pandas Essentials

Objective of the day
Learn how to work with tabular data (rows & columns) using pandas. By the end of today, you’ll know how to load, filter, group, and summarize datasets — critical for preparing data before training ML models.

1.Load dataset again

In [1]:
import pandas as pd

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv"
df = pd.read_csv(url)
print(df.head())


   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True  


2.Inspect data

In [2]:
print(df.shape)    # rows, columns
print(df.columns)  # column names
print(df.describe()) # summary statistics
print(df.info())   # datatypes + nulls

(891, 15)
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')
         survived      pclass         age       sibsp       parch        fare
count  891.000000  891.000000  714.000000  891.000000  891.000000  891.000000
mean     0.383838    2.308642   29.699118    0.523008    0.381594   32.204208
std      0.486592    0.836071   14.526497    1.102743    0.806057   49.693429
min      0.000000    1.000000    0.420000    0.000000    0.000000    0.000000
25%      0.000000    2.000000   20.125000    0.000000    0.000000    7.910400
50%      0.000000    3.000000   28.000000    0.000000    0.000000   14.454200
75%      1.000000    3.000000   38.000000    1.000000    0.000000   31.000000
max      1.000000    3.000000   80.000000    8.000000    6.000000  512.329200
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 colu

3. Select columns & rows

In [None]:
print(df["age"].head())   # single column
print(df[["sex","age"]].head())  # multiple columns

print(df.iloc[0])   # first row (by position)
print(df.loc[0])    # first row (by label)

"""
-df.iloc[0] means “give me the first row, no matter what the row label is.”

-df.loc[0] means “give me the row whose index label is 0.”

⚠️ If your DataFrame's index is not 0,1,2… but something else 
(like passenger IDs, dates, etc.), then loc[0] might fail or return 
a totally different row.
"""

4. Filtering

In [None]:
# All passengers older than 60
print(df[df["age"] > 60])

# All female passengers in 1st class
print(df[(df["sex"] == "female") & (df["pclass"] == 1)])

5. Grouping & aggregation

In [7]:
# Average age by sex
print(df.groupby("sex")["age"].mean())

# Survival rate by class
print(df.groupby("pclass")["survived"].mean())

sex
female    27.915709
male      30.726645
Name: age, dtype: float64
pclass
1    0.629630
2    0.472826
3    0.242363
Name: survived, dtype: float64


📊 Exercise of the Day

Find the average age of survivors vs. non-survivors.

Find the survival rate of males vs. females.

Find the average fare paid per passenger class (pclass)

In [10]:
#Average age survivors

print(df.groupby("survived")["age"].mean())

# Survival rate by sex
print()
print(df.groupby("sex")["survived"].mean())

#Average fair paid per class
print()
print(df.groupby("pclass")["fare"].mean())

survived
0    30.626179
1    28.343690
Name: age, dtype: float64

sex
female    0.742038
male      0.188908
Name: survived, dtype: float64

pclass
1    84.154687
2    20.662183
3    13.675550
Name: fare, dtype: float64


🌟 Mini-Challenge

Who had a better chance of surviving:

Women in 1st class

Men in 3rd class

👉 Use filtering + groupby to calculate survival rates for both groups. Then explain in plain words what the numbers tell you.

In [13]:

print(df.groupby(["sex", "pclass"])["survived"].mean())

sex     pclass
female  1         0.968085
        2         0.921053
        3         0.500000
male    1         0.368852
        2         0.157407
        3         0.135447
Name: survived, dtype: float64
