# Employee Attrition

## Aim: To predict whether the employed remain in his job or not.

### Description: 
#### The ML model used for this data predicts whether the employee should retained or not within a company. 
#### Features used:

- Age: Age of the employee(Integer type)
- Atrrition: Whether the employee is fired or not
- Business Travel: How often does the employee travel on business basis
- Daily rate: The amount payed to the employee daily i.e (Monthly Rate X 12) / Total working days in a year
- Department: Type of department the employee is working
- DistanceFromHome: The total distance from the employee's stay
- Education: Educational background
- EmployeeCount: Count of employee
- EmployeeNumber: Employee's unique number
- EnvironmentSatisfaction: Rate of environment satisfaction
- Gender: Gender of the employee
- HourlyRate: How much does the employee is payed on hourly based
- JobInvolvement: How well the employee is dedicated towards their job
- JobLevel: Level of the job
- JobRole: Role of the job
- JobSatisfaction: Employee's rating for his job satisfaction
- MartialStatus: Is employee married or not
- MonthlyIncome: Employee's fixed income per month
- MonthlyRate: Employee's total daily rate in a month
- NumCompaniesWorked: Number of companies does the employee worked in
- Over18: Is the employee is above 18 or not
- OverTime: Does the employee works overtime or not
- PercentSalaryHike: Increase in employee's salary in percentage
- PerformanceRating: Rating given for employee's overall performance
- RelationshipSatisfaction: How well the employee's relationship within organization/company
- StandardHours: A standard hour is the amount of work achievable, at the expected level of efficiency, in an hour
- StockOptionLevel: Level and the period of time granted to the employee to buy stocks 
- TotalWorkingYears: Number of years worked in their profession
- TrainingTimesLastYear: Number of times does the employee got trained in last year
- WorkLifeBalance: How well the employee balances their life
- YearsAtCompany: Number of years worked in the company
- YearsInCurrentRole: How many does the employee worked in a particular role
- YearsSinceLastPromotion: How many years are completed sinse their last promotion
- YearsWithCurrManager: How many does the employee worked under thier current manager

## Pipelines used:
- Exploratory Data Analysis 
- Feature Engineering
- Feature selection
- Model training and selection *

In [259]:
#logistic regression
#random forest classifier
#decision tree classifier
#XGboost classifier
#K means clustering
#KNN
#SVM_SVC
#gradient boosting

In [260]:
import pandas as pd
import numpy as np

In [261]:
import warnings
warnings.filterwarnings('ignore')

In [262]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cluster import KMeans

In [263]:
data_frame=pd.read_csv('final_input_df.csv')

In [264]:
data_frame.head()

Unnamed: 0,Age,DailyRate,Department,DistanceFromHome,EducationField,EnvironmentSatisfaction,JobInvolvement,JobLevel,JobRole,JobSatisfaction,...,NumCompaniesWorked,OverTime,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsWithCurrManager,Attrition
0,0.547619,0.71582,0.5,0.0,1.0,0.333333,0.666667,0.25,1.0,1.0,...,0.888889,0.0,0.0,0.0,0.2,0.0,0.15,0.222222,0.294118,1
1,0.738095,0.1267,1.0,0.25,1.0,0.666667,0.333333,0.25,0.75,0.333333,...,0.111111,1.0,1.0,0.333333,0.25,0.666667,0.25,0.388889,0.411765,0
2,0.452381,0.909807,1.0,0.035714,0.2,1.0,0.333333,0.0,0.875,0.666667,...,0.666667,0.0,0.333333,0.0,0.175,0.666667,0.0,0.0,0.0,1
3,0.357143,0.923407,1.0,0.071429,1.0,1.0,0.666667,0.0,0.75,0.666667,...,0.111111,0.0,0.666667,0.0,0.2,0.666667,0.2,0.388889,0.0,0
4,0.214286,0.350036,1.0,0.035714,0.8,0.0,0.666667,0.0,0.875,0.333333,...,1.0,1.0,1.0,0.333333,0.15,0.666667,0.05,0.111111,0.117647,0


## Spliting the data

In [265]:
from sklearn.model_selection import train_test_split

In [266]:
x=data_frame.iloc[:,:-1]
y=data_frame.iloc[:,-1]

In [267]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

### Defining the model training function

In [268]:
def model_training(model,model_name):
    print('Model name: {}'.format(model_name))
    trained_model=model.fit(x_train,y_train)
    model_score=model.score(x_test,y_test)
    print('Model score: {}'.format(model_score))
    return model_score

## LogisticRegression

In [269]:
model_training(LogisticRegression(),'Logistic Regression')

Model name: Logistic Regression
Model score: 0.7202702702702702


0.7202702702702702

In [270]:
models=[LogisticRegression(),RandomForestClassifier(),DecisionTreeClassifier(),XGBClassifier(),GradientBoostingClassifier(),KNeighborsClassifier(),KMeans()]
model_name=['Logistic','RFC','DTC','XBGC','GBC','KNC','KM']

In [271]:
scores=[]
for model,name in zip(models,model_name):
    trained_score_value=model_training(model,name)
    scores.append(trained_score_value)

Model name: Logistic
Model score: 0.7202702702702702
Model name: RFC
Model score: 0.9581081081081081
Model name: DTC
Model score: 0.8932432432432432
Model name: XBGC
Model score: 0.9472972972972973
Model name: GBC
Model score: 0.8608108108108108
Model name: KNC
Model score: 0.8175675675675675
Model name: KM
Model score: -823.9025963184624


In [272]:
score_index_value=scores.index(max(scores))
score_model_name=model_name[score_index_value]
print('Model name: {}, Model score: {}'.format(score_model_name,max(scores)))

Model name: RFC, Model score: 0.9581081081081081


In [309]:
model=RandomForestClassifier()

In [311]:
model.fit(x_train,y_train)

RandomForestClassifier()

In [312]:
model.score(x_test,y_test)

0.9527027027027027

In [313]:
y_pred=model.predict(x_test)

In [314]:
from sklearn.metrics import classification_report,confusion_matrix

In [315]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.97      0.93      0.95       369
           1       0.94      0.97      0.95       371

    accuracy                           0.95       740
   macro avg       0.95      0.95      0.95       740
weighted avg       0.95      0.95      0.95       740



In [316]:
tn,fp,fn,tp=confusion_matrix(y_test,y_pred).ravel()

In [317]:
print('True Positive: ',tp)
print('True Negative: ',tn)
print('False Positive: ',fp)
print('False Negative: ',fn)

True Positive:  360
True Negative:  345
False Positive:  24
False Negative:  11


In [318]:
import pickle
with open('hr_model_pickle.pkl','wb') as file:
    pickle.dump(model,file)