# Project Brief: Green Destinations

Green Destinations is a well-known travel agency. The HR Director has recently noticed and increase in employess leaving (attrition)

She would like to figure out any trends or patterns. She has surveyed the staff of Green Destinations and provided you with the data. She would like to know what the attrition rate is (% of people who left). She would also like to know if factors like age, years at the company and income play a part in determining if people will leave or not.

For this project, we will follow 6 step data analysis approach:

Ask - Prepare - Process - Analyse - Share - Act

### 1. Ask

- What specific data has been collected for analysis? Are there any limitations or gaps in the data that might affect the analysis?
- How are years at the company calculated? Is it based on tenure or hire date?
- Have reasons for leaving been recorded for each departing employee? If so, what categories do these reasons fall into e.g. better job opportunity, dissatisfaction with management, relocation?

KPI's

- No. of employee's
- No. of attrition's
- Attrition rate
- Average Age
- Average Salary
- Average years employee worked in company

### 2. Prepare

- We received our dataset from the management team directly
- Dataset stored in csv folder named as *greendestination-dataset.csv*

### 3. Process

In [109]:
#importing python libraries
import pandas as pd

In [2]:
#importing dataset into dataframe variable
dataframe = pd.read_csv('./csv/greendestination-dataset.csv')

In [3]:
#printing top 5 records from the top of dataset
dataframe.head(5)

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,1,2,...,4,80,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,1,4,...,2,80,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,1,5,...,3,80,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,1,7,...,4,80,1,6,3,3,2,2,2,2


In [4]:
#removing useless columns
del dataframe['EmployeeCount']
del dataframe['StandardHours']
del dataframe['Over18']

In [5]:
dataframe.head(5)

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeNumber,EnvironmentSatisfaction,...,PerformanceRating,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,2,...,3,1,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,2,3,...,4,4,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,4,4,...,3,2,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,5,4,...,3,3,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,7,1,...,3,4,1,6,3,3,2,2,2,2


In [6]:
#printing rows & columns of our dataset
print(f"Rows: {dataframe.shape[0]}\nColumns: {dataframe.shape[1]}")

Rows: 1470
Columns: 32


In [7]:
#checking whether dataset has any NULL values
dataframe.isnull().sum()

Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSinceLastPromotion     0
YearsWithCurrManager        0
dtype: int64

- All columns are named properly
- No null values present in our dataset

In [8]:
dataframe.head(5)

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeNumber,EnvironmentSatisfaction,...,PerformanceRating,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,2,...,3,1,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,279,Research & Development,8,1,Life Sciences,2,3,...,4,4,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,1373,Research & Development,2,2,Other,4,4,...,3,2,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,1392,Research & Development,3,4,Life Sciences,5,4,...,3,3,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,591,Research & Development,2,1,Medical,7,1,...,3,4,1,6,3,3,2,2,2,2


### 4. Analyse

KPI's

In [9]:
#finding number of employee's
print(f"No. of employee's: {len(dataframe)}")

No. of employee's: 1470


In [10]:
#finding number of attrition's
attrition_count = 0
for i in range(len(dataframe['Attrition'])):
    if dataframe['Attrition'][i] == 'Yes':
        attrition_count += 1

print(f"No. of attrition: {attrition_count}")

No. of attrition: 237


In [24]:
#finding attrition rate
print(f"Attrition rate: {round((attrition_count/len(dataframe['Attrition']))*100, 1)} %")

Attrition rate: 16.1 %


In [96]:
#finding average age
print(f"Average age: {round(dataframe['Age'].mean(), 0).astype(int)}")

Average age: 37


In [97]:
#finding average salary
print(f"Average salary: $ {round(dataframe['MonthlyIncome'].mean(), 0).astype(int)}")

Average salary: $ 6503


In [23]:
#finding average years
print(f"Average years: {round(dataframe['YearsAtCompany'].mean(), 1)}")

Average years: 7.0


Cards

In [114]:
#segregating employees by gender
gender_count = dataframe['Gender'].value_counts()
print(f"Male: {gender_count.iloc[0]}")
print(f"Female: {gender_count.iloc[1]}")

Male: 882
Female: 588


In [110]:
#finding attrition by gender
male_attrition_count = 0
female_attrition_count = 0

for i in range(len(dataframe)):
    if dataframe['Attrition'][i] == 'Yes':
        if dataframe['Gender'][i] == 'Male':
            male_attrition_count += 1
        elif dataframe['Gender'][i] == 'Female':
            female_attrition_count += 1

print("Attrition by Gender:")
print(f"Male: {male_attrition_count}")
print(f"Female: {female_attrition_count}")

Attrition by Gender:
Male: 150
Female: 87


In [86]:
#finding attrition by job role
unique_job_role = dataframe['JobRole'].unique()
print(f"No. of job roles: {len(unique_job_role)}")
print()
sales_executive_count = 0
research_scientist_count = 0
laboratory_technician_count = 0
manufacturing_director_count = 0
healthcare_representative_count = 0
manager_count = 0
sales_representative_count = 0
research_director_count = 0
human_resources_count = 0

for i in range(len(dataframe)):
    if dataframe['Attrition'][i] == 'Yes':
        if dataframe['JobRole'][i] == 'Sales Executive':
            sales_executive_count += 1
        elif dataframe['JobRole'][i] == 'Research Scientist':
            research_scientist_count += 1
        elif dataframe['JobRole'][i] == 'Laboratory Technician':
            laboratory_technician_count += 1
        elif dataframe['JobRole'][i] == 'Manufacturing Director':
            manufacturing_director_count += 1
        elif dataframe['JobRole'][i] == 'Healthcare Representative':
            healthcare_representative_count += 1
        elif dataframe['JobRole'][i] == 'Manager':
            manager_count += 1
        elif dataframe['JobRole'][i] == 'Sales Representative':
            sales_representative_count += 1
        elif dataframe['JobRole'][i] == 'Research Director':
            research_director_count += 1
        elif dataframe['JobRole'][i] == 'Human Resources':
            human_resources_count += 1

print("Attrition by Job Role:")
print(f"Sales Executive: {sales_executive_count}")
print(f"Research Scientist: {research_scientist_count}")
print(f"Laboratory Technician: {laboratory_technician_count}")
print(f"Manufacturing Director: {manufacturing_director_count}")
print(f"Healthcare Representative: {healthcare_representative_count}")
print(f"Manager: {manager_count}")
print(f"Sales Representative: {sales_representative_count}")
print(f"Research Director: {research_director_count}")
print(f"Human Resources: {human_resources_count}")

No. of job roles: 9

Attrition by Job Role:
Sales Executive: 57
Research Scientist: 47
Laboratory Technician: 62
Manufacturing Director: 10
Healthcare Representative: 9
Manager: 5
Sales Representative: 33
Research Director: 2
Human Resources: 12


In [95]:
#finding attrition by age
count_age_18_to_25 = 0
count_age_26_to_35 = 0
count_age_36_to_45 = 0
count_age_46_to_55 = 0
count_age_56_to_60 = 0

for i in range(len(dataframe)):
    if dataframe['Attrition'][i] == 'Yes':
        if dataframe['Age'][i] >= 18 and dataframe['Age'][i] <= 25:
            count_age_18_to_25 += 1
        elif dataframe['Age'][i] >= 26 and dataframe['Age'][i] <= 35:
            count_age_26_to_35 += 1
        elif dataframe['Age'][i] >= 36 and dataframe['Age'][i] <= 45:
            count_age_36_to_45 += 1
        elif dataframe['Age'][i] >= 46 and dataframe['Age'][i] <= 55:
            count_age_46_to_55 += 1
        elif dataframe['Age'][i] >= 56 and dataframe['Age'][i] <= 60:
            count_age_56_to_60 += 1

print(f"Attrition from (18 - 25) yrs: {count_age_18_to_25}")
print(f"Attrition from (26 - 35) yrs: {count_age_26_to_35}")
print(f"Attrition from (36 - 45) yrs: {count_age_36_to_45}")
print(f"Attrition from (46 - 55) yrs: {count_age_46_to_55}")
print(f"Attrition from (56 - 60) yrs: {count_age_56_to_60}")

Attrition from (18 - 25) yrs: 44
Attrition from (26 - 35) yrs: 116
Attrition from (36 - 45) yrs: 43
Attrition from (46 - 55) yrs: 26
Attrition from (56 - 60) yrs: 8


In [99]:
#finding attrition by salary
count_monthly_income_upto_2k = 0
count_monthly_income_upto_5k = 0
count_monthly_income_upto_10k = 0
count_monthly_income_upto_15k = 0
count_monthly_income_upto_15kplus = 0
for i in range(len(dataframe)):
    if dataframe['Attrition'][i] == 'Yes':
        if dataframe['MonthlyIncome'][i] <= 2000:
            count_monthly_income_upto_2k += 1
        elif dataframe['MonthlyIncome'][i] >= 2001 and dataframe['MonthlyIncome'][i] <= 5000:
            count_monthly_income_upto_5k += 1
        elif dataframe['MonthlyIncome'][i] >= 5001 and dataframe['MonthlyIncome'][i] <= 10000:
            count_monthly_income_upto_10k += 1
        elif dataframe['MonthlyIncome'][i] >= 10001 and dataframe['MonthlyIncome'][i] <= 15000:
            count_monthly_income_upto_15k += 1
        elif dataframe['MonthlyIncome'][i] >= 15001:
            count_monthly_income_upto_15kplus += 1

print(f"Attrition salary band from upto 2k: {count_monthly_income_upto_2k}")
print(f"Attrition salary band from 2k - 5k: {count_monthly_income_upto_5k}")
print(f"Attrition salary band from 5k - 10k: {count_monthly_income_upto_10k}")
print(f"Attrition salary band from 10k - 15k: {count_monthly_income_upto_15k}")
print(f"Attrition salary band from 15k plus: {count_monthly_income_upto_15kplus}")

Attrition salary band from upto 2k: 18
Attrition salary band from 2k - 5k: 145
Attrition salary band from 5k - 10k: 49
Attrition salary band from 10k - 15k: 20
Attrition salary band from 15k plus: 5


In [107]:
#finding attrition by education
unique_education = dataframe['EducationField'].unique()
print(f"No. of education field: {len(unique_education)}")
print()

life_sciences_edu_count = 0
others_edu_count = 0
medical_edu_count = 0
marketing_edu_count = 0
technical_degree_edu_count = 0
human_resources_edu_count = 0

for i in range(len(dataframe)):
    if dataframe['Attrition'][i] == 'Yes':
        if dataframe['EducationField'][i] == 'Life Sciences':
            life_sciences_edu_count += 1
        elif dataframe['EducationField'][i] == 'Other':
            others_edu_count += 1
        elif dataframe['EducationField'][i] == 'Medical':
            medical_edu_count += 1
        elif dataframe['EducationField'][i] == 'Marketing':
            marketing_edu_count += 1
        elif dataframe['EducationField'][i] == 'Technical Degree':
            technical_degree_edu_count += 1
        elif dataframe['EducationField'][i] == 'Human Resources':
            human_resources_edu_count += 1

print("Attrition by Education:")
print(f"Life Sciences: {life_sciences_edu_count}")
print(f"Others: {others_edu_count}")
print(f"Medical: {medical_edu_count}")
print(f"Marketing: {marketing_edu_count}")
print(f"Technical Degree: {technical_degree_edu_count}")
print(f"Human Resources: {human_resources_edu_count}")

No. of education field: 6

Attrition by Education:
Life Sciences: 89
Others: 11
Medical: 63
Marketing: 35
Technical Degree: 32
Human Resources: 7


In [108]:
dataframe.to_csv('./csv/cleaned-greendestination-dataset.csv', index = False)

### 5. Share

- I will be sharing all charts and graphs using Tableau
- Please use the dashboard link

### 6. Act

Insights

- No. of employee's: 1470
- No. of attrition: 237
- Attrition rate: 16.1 %
- Average age: 37
- Average salary: $ 6503
- Average years: 7.0
- Male: 882, Female: 588
- Attrition by Gender: Male: 150, Female: 87
- No. of job roles: 9
- Attrition by Job Role:
  1. Sales Executive: 57
  2. Research Scientist: 47
  3. Laboratory Technician: 62
  4. Manufacturing Director: 10
  5. Healthcare Representative: 9
  6. Manager: 5
  7. Sales Representative: 33
  8. Research Director: 2
  9. Human Resources: 12
- Attrition by Age:
  1. (18 - 25) yrs: 44
  2. (26 - 35) yrs: 116
  3. (36 - 45) yrs: 43
  4. (46 - 55) yrs: 26
  5. (56 - 60) yrs: 8
- Attrition by Salary:
  1. upto 2k: 18
  2. 2k - 5k: 145
  3. 5k - 10k: 49
  4. 10k - 15k: 20
  5. 15k plus: 5
- No. of education field: 6
- Attrition by Education:
  1. Life Sciences: 89
  2. Others: 11
  3. Medical: 63
  4. Marketing: 35
  5. Technical Degree: 32
  6. Human Resources: 7

Recommendations

Based on these insights, it appears that factors such as age, gender, salary, job role, and education field significantly influence attrition rates within the organization. Further analysis could delve into reasons behind these trends and formulate strategies to mitigate attrition, such as targeted retention programs, career development initiatives, and salary adjustments.