Skip to content

In this project a raw excel file named layoffs is being cleaned and performed some exploratory data analysis using SQL.

Notifications You must be signed in to change notification settings

vasukrishna001/DataCleaning_and_EDA_usingSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Cleaning and Exploratory Data Analysis (EDA) using SQL

Project Overview

In this project, we focus on cleaning a raw Excel dataset titled "layoffs" and performing Exploratory Data Analysis (EDA) using SQL. The main objective is to prepare the dataset for further analysis by addressing issues such as missing values, duplicates, and irrelevant data, and then exploring the data to uncover useful insights.

Key Components:

  • Data Cleaning: Using SQL queries, we apply various techniques to clean the raw dataset, ensuring data integrity and quality.
  • Exploratory Data Analysis (EDA): SQL queries are then used to explore the dataset, uncover patterns, and generate insights that can guide further analysis or decision-making.

Files Involved:

  • layoffs.xlsx: The raw dataset containing information related to layoffs.
  • DataCleaning_queries.sql: A set of SQL queries focused on cleaning the raw dataset by handling missing values, duplicates, and irrelevant columns.
  • EDA_queries.sql: A set of SQL queries used to analyze the cleaned data, focusing on uncovering trends, summarizing key metrics, and visualizing patterns in the data.

Project Steps

1. Data Cleaning:

The DataCleaning_queries perform several operations to clean the raw dataset:

  • Removing duplicate records
  • Handling missing or null values in crucial columns
  • Standardizing column names and data types
  • Filtering out irrelevant or incomplete records

2. Exploratory Data Analysis (EDA):

The EDA_queries perform in-depth analysis to derive actionable insights:

  • Identifying correlations and patterns between variables
  • Generating statistical summaries of key columns
  • Visualizing trends, distributions, and outliers in the data
  • Segmenting the data to explore various subgroups and trends

Objective:

The goal of this project is to transform a messy, unorganized raw dataset into a clean, structured format suitable for detailed analysis. By applying SQL-based data cleaning and performing thorough exploratory data analysis, we can extract valuable insights that could inform decision-making processes.


Technologies Used:

  • SQL: For data cleaning, transformation, and analysis
  • Excel: To import and view the raw dataset

Conclusion:

This project demonstrates how SQL can be used effectively to clean and analyze raw data. By performing both data cleaning and exploratory data analysis, we make the dataset ready for deeper analysis and potential use in decision-making or machine learning applications.

About

In this project a raw excel file named layoffs is being cleaned and performed some exploratory data analysis using SQL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published