# IBM People Analysis

People analytics is a necessary  process for improving a company's overall internal processes. It is a process where we take Human Resource (HR) data and translate it into meaningful insights. Many  companies already have metrics that they routinely captured either through surveys or even by internal checks. While HR can collect a massive amount of raw data it is up to them to also provide some type of analytical insights to give meaning to the data. Some insights the data can give are:<br>
1. Employee Retention
1. Job Satisfaction
1. Employee Diversification  

This will be an analysis of IBM HR Analytics Employee Attrition & Performance dataset. Let's see what insights we can discover from this dataset!

In [80]:
import numpy as np
import pandas as pd
import seaborn as sb 
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
data = pd.read_csv('../input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv')
(data.head())

/kaggle/input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv


Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2


In [81]:
print(data.isnull().sum())

Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSince


Great, there are aren't any null values in our dataset which means our dataset is complete. Lets see what this data can show us

# Visualization

We can first start the analysis to see what kind of demographics and diversification we have in the company by looking at the gender population, job level, and marital status

In [82]:
gender_data = data[['Gender','EmployeeCount']].groupby('Gender').count().reset_index(drop=False)
print(gender_data)
fig = go.Figure(data=[go.Pie(labels=gender_data['Gender'].unique(), values=gender_data['EmployeeCount'].unique(), textinfo='label+percent+value', hole=0,pull=[0.05, 0.05], textfont_size=15, marker=dict(line=dict(color='#000000', width=1.5))
                            )])
fig.show()

   Gender  EmployeeCount
0  Female            588
1    Male            882



It appears that the gender ratio for this IBM dataset is **40% Female and 60% Male**. This isn't surprising since many companies in the same industry has similar ratios. This goes to show that companies should have a bigger initiative to close the gap to as close as 50% as possible.  

In [83]:
def groupsubset_data(df,col1,col2):
    subdata = df.groupby([col1,col2]).count().reset_index(drop=False)
    subdata = subdata[[col1,col2,'EmployeeCount']]
    subdata.columns = [col1,col2,'Counts']
    return subdata

dept_att = groupsubset_data(data,'Department','Attrition')
fig3=px.bar(dept_att,x='Department',y='Counts',color='Attrition',title='Total Employees by Department',color_discrete_sequence=['royalblue','indianred'])
fig3.show()


There is a **higher rate of attrition under the sales department which is at 20.6%** compared to the others.**Human resources (HR) department has a rate 19%** while **Research & development (R&D) has 13.8%**. This is interesting to note, as one reason why the higher rate of  attrition in the sales department is due to the fact higher level of stress. This maybe related to companies profits and sales where they will have  to meet deadlines and sales goals to increase the companies sales per quarter and from the same quarter last year. The Year over year (YoY) performance metric is very important to companies as it gauges how well the company is doing and helps guides through the current year. This increase in pressure from stakeholders can trickle down to sales executives and associates which ultimately increases stress. I am not indicating there isn't stress in the HR and R&D department, I am just suggesting it may be a different kind of stress. It seems that in R&D may have a wider scope compared to sales and can be a reason why the attrition rate is much lower. 

In [84]:
fig = px.histogram(data, x="TotalWorkingYears", color="Attrition",color_discrete_sequence=['indianred','royalblue']
                  ,title='Total Working Years by Employees')
fig.update_layout(bargap=0.01) # gap between bars of adjacent location coordinates


From the above bar chart, we can see that there is a **spike of attrition when the total number of working years was 1**. This represents a **49.3% attrition rate** A reason for this can be that people tend to get their foot in the door for a certain role and move on into a different company where they want to grow their career. The rate of attrition then decreases after year 1 up until year 7 where there is a small spike of attrition coming in at 22%. Many companies require certain number of years of work experience to get considered for a position which can be attributed for the steady decrease of attrition till that point. The total number of employees dramatically drops after year 10. This may indicate some change in life events of employees.  

In [85]:
fig = px.histogram(data, x="Age", color="Attrition", nbins=43,color_discrete_sequence=['indianred','royalblue']
                  ,title='Total Counts of Employee Age')
fig.update_layout(bargap=0.01)

We can see how there is a stead decline of total employees after age 35. This is my be attributed to life events occurring at that age. According to google, the average age of marriage in the US is 32 years old. Additionally from google, a Forbes article indicated the average age for first time mothers increased to 26 while the average age for first time fathers is up to 31 base. Many of  these life events may take precedent over work and some may leave to take care of their family. This is also the time where the working person's parents may fall ill and may tend to them. As employee age gets older, they head into retirement or are unable to work due to health conditions. 

In [86]:
yearcurrent_roll = groupsubset_data(data,'YearsInCurrentRole','Attrition')
fig3=px.bar(yearcurrent_roll,x='YearsInCurrentRole',y='Counts',color='Attrition',title='Employee Years In Current Role',color_discrete_sequence=['royalblue','indianred'])
fig3.show()

The above graph indicates that there is a higher chance that a person will leave the company when the years in their current role is < 2 years. **At 2 year it is about 18%, at 1 year it is 19% , and at 0 years it is at a high of 29.9%** There is another small spike in 7 years with about 14%. This can represent that people either want to go on to a higher position in the company within 2 years or they will find the opportunity else where. This also is relevant at year 7 as well. Most people would like to advance their career and ultimately with a higher position it would mean a higher income. Let's see how important monthly income is

In [87]:
monthly_inc_att=data[['MonthlyIncome','Attrition']]
monthly_inc_att['RoundedMonthlyIncome'] = round(monthly_inc_att[['MonthlyIncome']] / 500.0) * 500.0
monthly_inc_att = monthly_inc_att.groupby(['RoundedMonthlyIncome','Attrition']).count().reset_index(drop=False)
monthly_inc_att.columns = ['RoundedMonthlyIncome','Attrition','Counts']
fig=px.line(monthly_inc_att,x='RoundedMonthlyIncome',y='Counts',color='Attrition',title='Employee Rounded Monthly Income',color_discrete_sequence=['royalblue','indianred'])
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



The above is a subset of the data where we round the monthly income to the nearest 500. We can see **attrition is higher when the monthly income is relatively low < 2500. The attrition rate came out to about 29.5% for monthly income at 2500 and 28.5% for monthly income at 2000** This would can mean there is some level of standard of living people want to have. This would then mean that people will will try to get better opportunities in which the end result would mean more money. When there is a higher level of monthly income, less people will tend to leave as indicated by the relatively flat line  once monthly income is >10k.

In [88]:
percHike_att = groupsubset_data(data,'PercentSalaryHike','Attrition')
fig4=px.line(percHike_att,x='PercentSalaryHike',y='Counts',color='Attrition',title='Percent Salary Hike',color_discrete_sequence=['royalblue','indianred'])
fig4.show()

We can see that there is a higher likely hood that people will leave when the percent salary hike is smaller. The percent **attrition is at 19.5% when percent hike is 11% and the trend decreases as the % salary hike gets larger as demonstrated in the above chart**. This is in line with our prior charts indicating people tend to more to higher paying job in order to meet their goals for whatever reasons they may be. It could be for better living for their family or they need the money for other reasons.  Retention will likely be higher as long as the rate hike is larger.

In [89]:
yr_lastpromo = groupsubset_data(data,'YearsSinceLastPromotion','Attrition')
fig4=px.area(yr_lastpromo,x='YearsSinceLastPromotion',y='Counts',color='Attrition',title='Number of Years Since Last Promotion',color_discrete_sequence=['royalblue','indianred'])
fig4.show()

As shown from the above chart, employees tend to leave if the number of years since last promotion gets larger. There is a **large attrition rate from 0 - 3** years since last promotion. **Starting from 3 years and counting back to 0 for # of years since last promotion, we get 17.3%, 16.9%, 13.7%, 18.9%**. People who stay after than would can be assumed to prefer to stay as is or don't mind getting promotion as they have been in their role for some time and understand where they are in life and adjust to their priorities. This can be either family and having a healthier work life balance or enjoy what they are currently doing or even there isn't a next position available
    

In [90]:
numComp_att = groupsubset_data(data,'NumCompaniesWorked','Attrition')
fig4=px.line(numComp_att,x='NumCompaniesWorked',y='Counts',color='Attrition',title='Number of Companies Worked by Employee',color_discrete_sequence=['royalblue','indianred'])
fig4.show()

As seen from the above chart, total counts of employees who only have worked few companies appear to either leave or switch companies. While people who have worked for more companies tend to stay and advance their careers. One of the reasons why people leave or switch companies so quickly is because they want to gain the work experience in order to get to the company they want to work for. Typically, if a person wants to advance their career they can move on to a company that may potentially pay more or try getting a promotion. However, as noted before it appears that people tend to leave if they do not get a promotion in the first few years. However, for those who do not leave, they can climb the ladder and work their way up. There are instances where people who worked at companies for many years leave to another because of either personal goals, want to experience different companies or even give themselves a challenge. By doing this they build themselves an impressive CV and can capture more eyes when looking for jobs.


In [91]:
jobsat_att = groupsubset_data(data,'JobSatisfaction','Attrition')
fig2=px.area(jobsat_att,x='JobSatisfaction',y='Counts',color='Attrition',title='Job Satisfaction Counts',color_discrete_sequence=['royalblue','indianred'])
fig2.show()

We can see that job satisfaction plays an important role in attrition as well. Going from Job satisfaction level 1 - 4, we can see the **attrition rate decreases starting from 22.8% and decreasing to 16.4%, 16.5%, 11.33%**. Ultimately, if employees are not satisfied, they won't hesitate to look for other opportunities.

# Conclusion 

After reviewing the data, we can observe a few things.<br>
1. People tend to leave during the early stages of the company they are working if they they are going into their first company. A reasons for this can be attributed gaining the necessary work experience to transfer into a different company that they wanted to in the first place.
1. People tend to leave if the monthly income is <2500. This again can vary depending on the location this data is based on. There is a different standard of living in each state. People will tend to stay as long as they are fairly compensated.
1. People may leave during their mid to late 30s for various reasons. This can overlap with compensation, moving to a different company that they wanted to go to before but lacked the work experience or because of major life events. Life events can include marriage, or starting a family. These are things that people may take priority over work.Depending on the life events occurred, settled and have managed their work life balance, people tend to stay in the company longer as they would like to have a stable job and have the opportunity to move up in their company. 
1. Different departments have different attrition rates. Depending on the type of department, different types of stress and stress levels can have an effect on a person. Stress in sales may have a higher impact than in HR and R&D department. 
1. Job satisfaction has an impact on attrition.    
<br>        
Of course some of these observations are not surprising, such as employee compensation and lack of promotions may have an increase in attrition. However it is good to see how this specific company is doing on this. Some of these observations maybe obvious but it's good to see the numbers and if any kind of intervention can lead to a change in these numbers. While this dataset was very well put together, it would be nice to see what state the this company data is based on. It would be good to see what the standard of living is for this dataset is. Additionally, it would be interesting to see the race and ethnicity of the employees are.