<h1 style="text-align:center; font-size:36px;">
HR Insights: Tracking Workforce Trends</h1>


***Project Overview***



***```Domain```***:Human Resources Analytics / Workforce Analytics

***```Objective```***

The objective of this project is to analyze employee demographic characteristics, job roles, compensation structure, work experience, and satisfaction levels to identify the key factors influencing employee attrition. The analysis focuses on understanding behavioral and organizational patterns that distinguish employees who leave the organization from those who remain, supporting data-driven retention strategies.

***```Business Problem```***:

Employee turnover poses a significant challenge for organizations. High attrition rates lead to increased recruitment and training costs, loss of experienced talent, reduced productivity, and disruptions in team stability. Many organizations lack clear insights into the underlying factors that contribute to employee dissatisfaction and voluntary exits.

- By identifying the drivers of employee attrition, organizations can:

- Improve employee engagement and job satisfaction

- Design effective and targeted retention strategies

- Optimize compensation, career progression, and work-life balance initiatives

- Reduce operational costs associated with frequent hiring

- This project applies exploratory data analysis (EDA) techniques to uncover patterns and trends in employee data that explain attrition behavior and support proactive HR decision-making.

***```Dataset Description```***:

The analysis is conducted using the HR Insights: Tracking Workforce Trends Dataset, which contains detailed employee-level data from a corporate environment.

- Key Characteristics of the Dataset:

- Total Records: 23530

- Total Features: 35

- Data Type: Structured, tabular dataset

- Target Variable: Attrition (Yes / No)

<h1 style="font-size:22px;">PHASE 1: DATA LOADING & INITIAL OVERVIEW</h1>


```EXECUTIVE SUMMARY```

Data loading and initial overview is used to import the dataset and understand its basic structure, size, and variables. This phase helps identify data types, missing values, and overall data composition, providing a foundation for further cleaning and analysis.

```PHASE 1: DATA LOADING & INITIAL OVERVIEW – SUMMARY```

This phase focuses on loading the HR dataset and understanding its structure and basic characteristics.

Using:

- Data loading techniques

- Dataset shape and dimension checks

- Data type inspection

- Initial data preview (head & tail)

- Basic descriptive statistics

this phase provides a high-level understanding of the dataset and identifies potential data quality issues to guide further cleaning and analysis.

>STEP 1: IMPORT NECESSARY LIBRARIES

Objective: Load essential Python libraries required for data handling and inspection.

In [1]:
import pandas as pd

In [2]:
import numpy as np


>STEP 2: LOAD THE DATASET

***2.1 Data Source and Loading***

Data Source:

The dataset used in this analysis is the HR Insights: Tracking Workforce Trends Dataset (extended version) available on Kaggle.

***Source Link***:
https://www.kaggle.com/datasets/dgokeeffe/ibm-hr-wmore-rows

This dataset contains HR records for employees across various departments and roles, with a focus on understanding attrition patterns.

***Dataset Description***

HR Insights: Tracking Workforce Trends dataset includes detailed information for a large number of employees, capturing demographic, job, compensation, performance, and engagement-related attributes. It is widely used for analyzing factors that contribute to employee attrition.

***Loading Objectives***

- The goal of this step is to import the dataset and perform an initial overview to understand its structure before detailed analysis. The objectives include:

- Identifying the total number of records (rows) — representing employees in the dataset

- Determining the total number of features (columns) — representing employee attributes

- Reviewing column names and data types

- Assessing the general completeness and structure of the dataset

In [3]:
data = pd.read_csv("IBM_HR_Data.csv", low_memory=False)

>STEP 3: DATA TYPE INSPECTION

Objective: Understand the data types and identify potential issues

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23530 entries, 0 to 23529
Data columns (total 37 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Age                       23527 non-null  float64
 1   Attrition                 23517 non-null  object 
 2   BusinessTravel            23522 non-null  object 
 3   DailyRate                 23519 non-null  float64
 4   Department                23519 non-null  object 
 5   DistanceFromHome          23521 non-null  float64
 6   Education                 23518 non-null  float64
 7   EducationField            23521 non-null  object 
 8   EmployeeCount             23525 non-null  float64
 9   EmployeeNumber            23530 non-null  object 
 10  Application ID            23527 non-null  object 
 11  EnvironmentSatisfaction   23521 non-null  float64
 12  Gender                    23520 non-null  object 
 13  HourlyRate                23521 non-null  float64
 14  JobInv

Insight:

- Reveals column data types
- Highlights missing values
- Identifies columns requiring type conversion

>STEP 4: DATASET DIMENSIONS

Objective: Identify the size of the dataset.

In [5]:
print("shape:",data.shape)

shape: (23530, 37)


Insight:

- Displays the total number of rows and columns
- Helps assess whether the dataset meets minimum project requirements

>STEP 5: PREVIEW DATA RECORDS

Objective: Examine sample records to understand variable content.

In [6]:
display(data.head(5))

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,Employee Source
0,41.0,Voluntary Resignation,Travel_Rarely,1102.0,Sales,1.0,2.0,Life Sciences,1.0,1,...,80.0,0.0,8.0,0.0,1.0,6.0,4.0,0.0,5.0,Referral
1,37.0,Voluntary Resignation,Travel_Rarely,807.0,Human Resources,6.0,4.0,Human Resources,1.0,1,...,80.0,0.0,8.0,0.0,1.0,6.0,4.0,0.0,5.0,Referral
2,41.0,Voluntary Resignation,Travel_Rarely,1102.0,Sales,1.0,2.0,Life Sciences,1.0,1,...,80.0,0.0,8.0,0.0,1.0,6.0,4.0,0.0,5.0,Referral
3,37.0,Voluntary Resignation,Travel_Rarely,807.0,Human Resources,6.0,4.0,Marketing,1.0,4,...,80.0,0.0,8.0,0.0,1.0,6.0,4.0,0.0,5.0,Referral
4,37.0,Voluntary Resignation,Travel_Rarely,807.0,Human Resources,6.0,4.0,Human Resources,1.0,5,...,80.0,0.0,8.0,0.0,1.0,6.0,4.0,0.0,5.0,Referral


Insight:

- Confirms correct data loading
- Helps spot obvious anomalies or formatting issues

>STEP 6: INITIAL STATISTICAL OVERVIEW

Objective: Generate basic descriptive statistics.

In [7]:
display(data.describe(include='all'))

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager,Employee Source
count,23527.0,23517,23522,23519.0,23519,23521.0,23518.0,23521,23525.0,23530.0,...,23520.0,23521.0,23522.0,23519.0,23520.0,23517.0,23515.0,23519.0,23523.0,23518
unique,,3,3,,3,,,7,,23462.0,...,,,,,,,,,,10
top,,Current employee,Travel_Rarely,,Research & Development,,,Life Sciences,,23244.0,...,,,,,,,,,,Company Website
freq,,19712,16700,,15349,,,9725,,7.0,...,,,,,,,,,,5428
mean,36.914354,,,802.168375,,9.193019,2.910962,,1.0,,...,80.0,0.791548,11.2632,2.796973,2.761437,7.006081,4.225048,2.179642,4.122136,
std,9.130563,,,403.198769,,8.098043,1.023755,,0.0,,...,0.0,0.850294,7.785116,1.289328,0.705991,6.13242,3.624251,3.213205,3.57317,
min,18.0,,,102.0,,1.0,1.0,,1.0,,...,80.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,
25%,30.0,,,465.0,,2.0,2.0,,1.0,,...,80.0,0.0,6.0,2.0,2.0,3.0,2.0,0.0,2.0,
50%,36.0,,,802.0,,7.0,3.0,,1.0,,...,80.0,1.0,10.0,3.0,3.0,5.0,3.0,1.0,3.0,
75%,43.0,,,1157.0,,14.0,4.0,,1.0,,...,80.0,1.0,15.0,3.0,3.0,9.0,7.0,3.0,7.0,


Insight:

- Summarizes numerical and categorical variables
- Identifies outliers, ranges, and distributions
- Highlights missing or unusual values