๐งผ Layoff Data Cleaning for EDA (SQL Project)
๐ Overview
This project focuses on cleaning a messy global layoff dataset from Kaggle to prepare it for Exploratory Data Analysis (EDA).
The goal was to fix duplicates, standardize data formats, handle missing values, and tidy up the structure โ so the data is ready for real analysis.
๐ About This Project
It helped me sharpen my hands-on skills with data preparation and gave me confidence working with messy, unstructured data.
๐๏ธ Dataset
Source: https://www.kaggle.com/datasets/swaptr/layoffs-2022
Type: Global layoffs, 2020โ2022
Includes: Company name, location, industry, total laid off, percentage laid off, funding raised, stage, and date
๐ง What I Did
โ Created a safe staging table to protect the original dataset
โ Removed duplicates using ROW_NUMBER() and CTEs
โ Standardized text fields (company, industry, location)
โ Converted date column from text to SQL DATE format
โ Handled missing values by converting blanks to NULL and filling missing values using self joins
โ Dropped unreliable rows/columns to clean the dataset thoroughly
๐ง Tools & Skills Used
SQL (MySQL) for data cleaning and manipulation
Techniques:
Window Functions (ROW_NUMBER())
CTEs (Common Table Expressions)
Text trimming & standardization
Date conversion
NULL handling and logic
๐ก What I Learned
How to clean messy real-world data in SQL
How to write clear, well-commented SQL scripts step by step
How to prepare datasets for actual EDA work โ not just theory