A complete learning repository covering Exploratory Data Analysis (EDA) from theory to practice — created specially for students to master data understanding, cleaning, and visualization techniques in Python.
This repository serves as a comprehensive guide to learning EDA both conceptually and practically.
It contains two main components:
- 🧾 Theory File: Explains every EDA concept including data types, summary statistics, missing values, outliers, correlation, distributions, and visualization techniques.
- 💻 Practical Notebook: A complete hands-on EDA project using the Titanic dataset (from Seaborn), demonstrating every concept step-by-step in Python.
This repository helps students connect theory with real implementation, making EDA easy and engaging to learn.
✅ Understanding different types of data
✅ Handling missing and duplicate values
✅ Detecting and treating outliers
✅ Exploring numerical and categorical features
✅ Correlation analysis and feature relationships
✅ Data visualization using Matplotlib and Seaborn
✅ Drawing meaningful insights and EDA summaries
Dataset: Titanic (available in Seaborn library)
import seaborn as sns
titanic = sns.load_dataset('titanic')The Titanic dataset is ideal for practicing EDA — it involves passenger survival data and helps learners explore relationships between features like age, gender, class, and survival status.
- Python 3.x
- Pandas – Data handling and cleaning
- NumPy – Numerical operations
- Matplotlib – Visualization
- Seaborn – Statistical graphics
- Jupyter Notebook – Interactive code execution
EDA-Theory-and-Practice/
│
├── 📘 EDA_Method_Theory.ipynb # Complete EDA theory notes
├── 💻 EDA_Method_Practise.ipynb # Practical EDA notebook
└── LICENSE # MIT License file
-
Clone this repository:
git clone https://github.com/yourusername/Exploratory-Data-Analysis-Tutorial.git
-
Open the Jupyter Notebook (
Titanic_EDA.ipynb). -
Run each cell and follow the step-by-step EDA workflow.
-
Refer to
EDA_Method_Theory.pdffor theoretical explanations. -
Use outputs and plots to interpret and summarize your findings.
By the end of this module, you will:
- Understand EDA principles thoroughly
- Be able to clean and analyze raw data efficiently
- Visualize relationships and patterns effectively
- Gain confidence in preparing datasets for machine learning
This project prepares you for real-world data science tasks and interview-level EDA questions.
Vishnu V Unnikrishnan 📍 Data Science & AI Faculty | IPCS Global, Bangalore 💬 Dedicated to teaching Data Science, Machine Learning, and AI through hands-on, project-based learning.
This project is licensed under the MIT License — feel free to use, share, and modify for learning or educational purposes, with proper attribution.
