In this project, we focus on cleaning a raw Excel dataset titled "layoffs" and performing Exploratory Data Analysis (EDA) using SQL. The main objective is to prepare the dataset for further analysis by addressing issues such as missing values, duplicates, and irrelevant data, and then exploring the data to uncover useful insights.
- Data Cleaning: Using SQL queries, we apply various techniques to clean the raw dataset, ensuring data integrity and quality.
- Exploratory Data Analysis (EDA): SQL queries are then used to explore the dataset, uncover patterns, and generate insights that can guide further analysis or decision-making.
- layoffs.xlsx: The raw dataset containing information related to layoffs.
- DataCleaning_queries.sql: A set of SQL queries focused on cleaning the raw dataset by handling missing values, duplicates, and irrelevant columns.
- EDA_queries.sql: A set of SQL queries used to analyze the cleaned data, focusing on uncovering trends, summarizing key metrics, and visualizing patterns in the data.
The DataCleaning_queries perform several operations to clean the raw dataset:
- Removing duplicate records
- Handling missing or null values in crucial columns
- Standardizing column names and data types
- Filtering out irrelevant or incomplete records
The EDA_queries perform in-depth analysis to derive actionable insights:
- Identifying correlations and patterns between variables
- Generating statistical summaries of key columns
- Visualizing trends, distributions, and outliers in the data
- Segmenting the data to explore various subgroups and trends
The goal of this project is to transform a messy, unorganized raw dataset into a clean, structured format suitable for detailed analysis. By applying SQL-based data cleaning and performing thorough exploratory data analysis, we can extract valuable insights that could inform decision-making processes.
- SQL: For data cleaning, transformation, and analysis
- Excel: To import and view the raw dataset
This project demonstrates how SQL can be used effectively to clean and analyze raw data. By performing both data cleaning and exploratory data analysis, we make the dataset ready for deeper analysis and potential use in decision-making or machine learning applications.