# 📊 Exploratory Data Analysis on Cardiotocographic Dataset
#### **Date:** July 20, 2025
#### **Objective:**
To analyze the structure, patterns, and relationships in fetal cardiotocographic data.


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## 1. 📥 Load and Preview Dataset

In [None]:
# Load dataset
df = pd.read_csv('Cardiotocographic.csv')
df.head()

## 2. 🧹 Data Cleaning and Preparation

In [None]:
# Check missing values
df.isnull().sum()

In [None]:
# Fill missing values with median
for col in ['AC', 'DS', 'DP', 'MLTV', 'Width', 'Tendency', 'NSP']:
    if col in df.columns:
        df[col].fillna(df[col].median(), inplace=True)

# Remove duplicates
df.drop_duplicates(inplace=True)
df.shape

## 3. 📊 Statistical Summary

In [None]:
df.describe().T

## 4. 📉 Data Visualization

In [None]:
# Histograms
df.hist(figsize=(20, 15), bins=30)
plt.tight_layout()
plt.show()

In [None]:
# Boxplots
plt.figure(figsize=(20, 15))
sns.boxplot(data=df)
plt.xticks(rotation=90)
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(14,10))
sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## 5. 🔍 Key Insights and Summary
- Baseline heart rate (`LB`) is around 130 bpm on average.
- `ASTV`, `ALTV` tend to vary significantly with fetal health (NSP).
- Many `DL`, `DP` values are 0, but some outliers indicate potential fetal stress.
- Class imbalance in `NSP` (mostly normal).
- Strong correlations found between `MLTV`, `ALTV`, and other features.
