Perform the following operations using Python on a data set : read data
from different formats(like csv, xls),indexing and selecting data, sort data,
describe attributes of data, checking data types of each column. (Use
Titanic Dataset).

In [1]:
import pandas as pd

In [2]:
# Read data from CSV format
# Ensure 'Titanic.csv' is in the same directory or provide the full path
df = pd.read_csv('Titanic.csv')

# Read data from Excel format (Example code)
# Note: This requires 'openpyxl' or 'xlrd' library and an actual .xlsx file
# df_excel = pd.read_excel('Titanic.xlsx')

print("Data loaded successfully.")
print(df.head()) # Display first 5 rows to verify

Data loaded successfully.
   PassengerId  Survived  Pclass  \
0          892         0       3   
1          893         1       3   
2          894         0       2   
3          895         0       3   
4          896         1       3   

                                           Name     Sex   Age  SibSp  Parch  \
0                              Kelly, Mr. James    male  34.5      0      0   
1              Wilkes, Mrs. James (Ellen Needs)  female  47.0      1      0   
2                     Myles, Mr. Thomas Francis    male  62.0      0      0   
3                              Wirz, Mr. Albert    male  27.0      0      0   
4  Hirvonen, Mrs. Alexander (Helga E Lindqvist)  female  22.0      1      1   

    Ticket     Fare Cabin Embarked  
0   330911   7.8292   NaN        Q  
1   363272   7.0000   NaN        S  
2   240276   9.6875   NaN        Q  
3   315154   8.6625   NaN        S  
4  3101298  12.2875   NaN        S  


In [3]:
# 1. Selecting specific columns
# Select a single column
ages = df['Age']
# Select multiple columns
subset = df[['Name', 'Sex', 'Age', 'Survived']]

# 2. Indexing with iloc (Integer Location)
# Select first 5 rows and first 3 columns
rows_iloc = df.iloc[:5, :3]
print(rows_iloc)
print('\n')

# 3. Indexing with loc (Label/Condition based)
# Select data where Age is greater than 50
older_passengers = df.loc[df['Age'] > 76]
print("hi ",older_passengers)
print('\n')

print("Indexing examples executed.")
print(subset.head())

   PassengerId  Survived  Pclass
0          892         0       3
1          893         1       3
2          894         0       2
3          895         0       3
4          896         1       3


hi  Empty DataFrame
Columns: [PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked]
Index: []


Indexing examples executed.
                                           Name     Sex   Age  Survived
0                              Kelly, Mr. James    male  34.5         0
1              Wilkes, Mrs. James (Ellen Needs)  female  47.0         1
2                     Myles, Mr. Thomas Francis    male  62.0         0
3                              Wirz, Mr. Albert    male  27.0         0
4  Hirvonen, Mrs. Alexander (Helga E Lindqvist)  female  22.0         1


In [4]:
# Sort data by 'Age' in ascending order
sorted_by_age = df.sort_values(by='Age', ascending=True)

# Sort data by 'Fare' in descending order
sorted_by_fare = df.sort_values(by='Fare', ascending=False)

print("Data sorted by Fare (Top 5):")
print(sorted_by_fare[['Name', 'Fare']].head())

Data sorted by Fare (Top 5):
                                                  Name      Fare
343  Cardeza, Mrs. James Warburton Martinez (Charlo...  512.3292
69                 Fortune, Mrs. Mark (Mary McDougald)  263.0000
53                          Fortune, Miss. Ethel Flora  263.0000
59                         Chaudanson, Miss. Victorine  262.3750
64                         Ryerson, Master. John Borie  262.3750


In [5]:
# Generate descriptive statistics (count, mean, std, min, max, etc.)
description = df.describe()

print("Statistical Description of Numerical Columns:")
print(description)

# Check the shape of the dataset (rows, columns)
print(f"\nDataset Shape: {df.shape}")

# List attributes (column names)
print(f"\nAttributes: {df.columns.tolist()}")

Statistical Description of Numerical Columns:
       PassengerId    Survived      Pclass         Age       SibSp  \
count   418.000000  418.000000  418.000000  332.000000  418.000000   
mean   1100.500000    0.363636    2.265550   30.272590    0.447368   
std     120.810458    0.481622    0.841838   14.181209    0.896760   
min     892.000000    0.000000    1.000000    0.170000    0.000000   
25%     996.250000    0.000000    1.000000   21.000000    0.000000   
50%    1100.500000    0.000000    3.000000   27.000000    0.000000   
75%    1204.750000    1.000000    3.000000   39.000000    1.000000   
max    1309.000000    1.000000    3.000000   76.000000    8.000000   

            Parch        Fare  
count  418.000000  417.000000  
mean     0.392344   35.627188  
std      0.981429   55.907576  
min      0.000000    0.000000  
25%      0.000000    7.895800  
50%      0.000000   14.454200  
75%      0.000000   31.500000  
max      9.000000  512.329200  

Dataset Shape: (418, 12)

Attribut

In [6]:
# Check data types of each column
print("Data Types of Each Column:")
print(df.dtypes)

# Detailed information including non-null counts
print("\nDetailed Info:")
df.info()

Data Types of Each Column:
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

Detailed Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  418 non-null    int64  
 1   Survived     418 non-null    int64  
 2   Pclass       418 non-null    int64  
 3   Name         418 non-null    object 
 4   Sex          418 non-null    object 
 5   Age          332 non-null    float64
 6   SibSp        418 non-null    int64  
 7   Parch        418 non-null    int64  
 8   Ticket       418 non-null    object 
 9   Fare         417 non-null    float64
 10  Cabin        91 non-null     object 
 11  Embarked   