Skip to content
This project investigates IBM Human Resources Attrition Data Set.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
results Add files via upload Jun 22, 2019

IBM Human Resources Attrition

This project investigates IBM Human Resources Attrition Data Set.


I want to thank Natasha for the blog on People Analytic for providing background knowledge for me to understand Human Resources departmets. I also want to thank IBM analysts for providing data set on Kaggle.


Companies hire many employees every year. To create a positive working and learning environment, firms invest time and money in trianing the new members and also to get existing employees involved as well. The goal of these programs aim to increase the effectiveness of the employees and in doing so the firm as a whole can have better output in long run.


The single most important feature we are interested in is attrition. Attrition in human resources refer to the gradual loss of employees over time. In general relatively high attrition is problematic for companies. Human Resource professionals often asume a leadership role in designing company compensation programs, work culture and motivation systems that help the organization retain top employees.


To investigate this topic, I use IBM HR Analytics Employee Attrition and Performance Dataset. There are total of 1470 samples and 35 features. Among the target, Attrition, there are 237 candidates committed to Yes (i.e. left the company) and the rest 1233 candidates committed to No (i.e. stayed at the company).


We can explore using histograms and correlation plots.

Multiple histograms:

Correlation plot:

Based on correlation, Attrition is associated negatively with Age, JobInvolvement, JobLevel, Jobsatisfaction, MonthlyIncome, StockOptionLevel, TotalWorkingYears, YearsAtCompany, YearsInCurrentCompany, YearsInCurrentRole, and YearsWithCurrManager.

For the people who left the firm (committed to Yes to Attrition), the most common JobRole is Laboratory Technician and Sales Representative. From our analysis below, we see that the Laboratory Technician who spent a year at the firm and than left sat on a high of 30.9% among those who committed Yes to Attrition. The second is Sales Representative that stayed at the firm for a year, at 9.1%. The third group of people who stayed at the firm for a year and left are Research Scientist, at a shy of 17.6%. These are the top three demographics that contribute to the Attrition the highest.


We select the following variables based on partition influence measure.

Top Module Measure Variables Names
8, 15, 29 13.86 EducationField, JobRole, YearsAtCompany
15, 29 13.3 JobRole, YearsAtCompany

We attempted using Bagging, Gradient Boosting Machine, Naive Bayes, Linear Model, Random Forest, iterative Random Forest, and Bayesian Additive Regression Tree (BART).

Name Result (Measured by AUC)
Bagging or Bootstrap Aggregation 0.904
Gradient Boosting Machine 0.526
Naive Bayes 0.50
Linear Model or Least Squares 0.745
Random Forest 0.544
iterative Random Forest 0.808
Bayesian Additive Regression Tree (BART) 0.890

The implication is that a new candidate walks in the door and by collecting the information of EducationField, JobRole, and YearsAtCompany I can tell manager this employee will commit Yes to Attrition with a certain probability that is on average 90% accurate!

You can’t perform that action at this time.