Skip to content

uday3421-DA/SQL_PROJECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿงผ Layoff Data Cleaning for EDA (SQL Project)

๐Ÿ“Œ Overview

This project focuses on cleaning a messy global layoff dataset from Kaggle to prepare it for Exploratory Data Analysis (EDA).

The goal was to fix duplicates, standardize data formats, handle missing values, and tidy up the structure โ€” so the data is ready for real analysis.

๐Ÿ“ About This Project

It helped me sharpen my hands-on skills with data preparation and gave me confidence working with messy, unstructured data.

๐Ÿ—‚๏ธ Dataset

Source: https://www.kaggle.com/datasets/swaptr/layoffs-2022

Type: Global layoffs, 2020โ€“2022

Includes: Company name, location, industry, total laid off, percentage laid off, funding raised, stage, and date

๐Ÿ”ง What I Did

โœ… Created a safe staging table to protect the original dataset

โœ… Removed duplicates using ROW_NUMBER() and CTEs

โœ… Standardized text fields (company, industry, location)

โœ… Converted date column from text to SQL DATE format

โœ… Handled missing values by converting blanks to NULL and filling missing values using self joins

โœ… Dropped unreliable rows/columns to clean the dataset thoroughly

๐Ÿง  Tools & Skills Used

SQL (MySQL) for data cleaning and manipulation

Techniques:

Window Functions (ROW_NUMBER())

CTEs (Common Table Expressions)

Text trimming & standardization

Date conversion

NULL handling and logic

๐Ÿ’ก What I Learned

How to clean messy real-world data in SQL

How to write clear, well-commented SQL scripts step by step

How to prepare datasets for actual EDA work โ€” not just theory

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published