<font size=7>IBM HR Analytics Employee Attrition & Performance

<font size=6>Logistic Regression Model

In [1]:
try:
    from sklearn.metrics import accuracy_score, confusion_matrix, recall_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from sklearn import metrics    # calculate accuracy measures and confusion matrix
    from sklearn.preprocessing import LabelEncoder
    from scipy import stats
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    import warnings
    import sys, re, os
    from tabulate import tabulate
    warnings.filterwarnings("ignore")
    pd.set_option("display.max_columns", 500)
    pd.set_option("display.max_rows", 500)
    print("Modules imported successfully...")

except Exception as err:
    print(err)


Modules imported successfully...


In [2]:
# Defined Functions :

#Defining a funciton to read the file, split the data into train and test and print the shape of both the data.
# setting dir:
main_folder = "D:/Python Practice/IBM Attrition Dataset"
file_name = "train.csv"

def splitting_data(dir, test_size = 0.2) -> tuple:
    data = pd.read_csv(dir)
    train, test= train_test_split(data, test_size = 0.2, random_state= 42)
    print(f"Train shape: {train.shape}")
    print(f"Test shape: {test.shape}")
    return train, test

# Defining a function to replace values from a column
def process_attrition(data, map_column, classes):    
    return data[map_column].map(classes)

        
# Defining a function to perform hypothesis testing using point biserial       
def cat_to_cont_hypo(data,confidence = 0.95, test = "point", tail =2):
    """test: point -> pointbiserial
            z -> z_test
            t -> t test
            annova -> annova"""
    if tail ==2:
        pass
    else:
        tail = 1

    if test == "point": 
        for col in data.columns:
            if data[col].nunique()>10:
                null = "has no significant effect on attrition"
                alternate= "has significant effect on attrition"
                print(f"Null= {col}",null)
                print(f"Alternate= {col}",alternate)   
                r, p = stats.pointbiserialr(data[col], data["Attrition"])
                print("r= ",r,"p= ", p)
                if p < (1 - confidence)/tail:
                    print("Reject null","\n")
                else:
                    print("Accept null","\n")
    else:
        pass


# Defining a function to seprate the continuous and categorical variables.
def seprate_data_type(data):    
    cat=[]
    cont=[]
    for col in data.columns:
        if data[col].nunique()<10:
            cat.append(col)
        else:
            cont.append(col)
    return cat, cont


# Defining a function to encode the categorical data
def label_encode(data):
    labelencoder = LabelEncoder()
    columns=data.select_dtypes([object])
    for col in columns:
        data[col]=labelencoder.fit_transform(data[col])
    return data 


# Defining a function to treat the outliers present in the data.
# def treat_outliers_iqr(data,outliers_list):    
#     for outlier in outliers_list:
#         q1 = np.percentile(data[outlier], 25)
#         q3 = np.percentile(data[outlier], 75)
#         # print(q1, q3)
#         IQR = q3-q1
#         lwr_bound = q1-(1.5*IQR)
#         upr_bound = q3+(1.5*IQR)
#         data[outlier] = np.where(data[outlier] > upr_bound, upr_bound, data[outlier])
#         data[outlier] = np.where(data[outlier] < lwr_bound, lwr_bound, data[outlier])
#     return data

In [3]:
# Splittin the data into train and test data.
df_train,df_test=splitting_data(main_folder+"/"+file_name)

Train shape: (1176, 35)
Test shape: (294, 35)


In [4]:
# Checking if the training data contains any missing values and raise an error if found.
assert df_train.isnull().sum().sum() ==0, "Data is having null values"

### Problem: IBM wants to invest in their employees and it wants to know whether a person will leave or not... If a person has the higher chances of leaving the brand; he/she won't be eligible for training/promotion

## Train Data Analysis

In [6]:
# Displaying the top 5 rows of the data
df_train.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,Gender,HourlyRate,JobInvolvement,JobLevel,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,Over18,OverTime,PercentSalaryHike,PerformanceRating,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
1097,24,No,Travel_Rarely,350,Research & Development,21,2,Technical Degree,1,1551,3,Male,57,2,1,Laboratory Technician,1,Divorced,2296,10036,0,Y,No,14,3,2,80,3,2,3,3,1,1,0,0
727,18,No,Non-Travel,287,Research & Development,5,2,Life Sciences,1,1012,2,Male,73,3,1,Research Scientist,4,Single,1051,13493,1,Y,No,15,3,4,80,0,0,2,3,0,0,0,0
254,29,No,Travel_Rarely,1247,Sales,20,2,Marketing,1,349,4,Male,45,3,2,Sales Executive,4,Divorced,6931,10732,2,Y,No,14,3,4,80,1,10,2,3,3,2,0,2
1175,39,No,Travel_Rarely,492,Research & Development,12,3,Medical,1,1654,4,Male,66,3,2,Manufacturing Director,2,Married,5295,7693,4,Y,No,21,4,3,80,0,7,3,3,5,4,1,0
1341,31,No,Travel_Rarely,311,Research & Development,20,3,Life Sciences,1,1881,2,Male,89,3,2,Laboratory Technician,3,Divorced,4197,18624,1,Y,No,11,3,1,80,1,10,2,3,10,8,0,2


In [7]:
# Displaying all the columns of the data
df_train.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount',
       'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',
       'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
       'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',
       'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',
       'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',
       'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
       'YearsWithCurrManager'],
      dtype='object')

In [8]:
# Displaying the categorical and cont. variables in tabular form
categorical,continuous=seprate_data_type(df_train)   
print(tabulate({"categorical":categorical,
                "continuous":continuous},headers=["categorical","continuous"]))

categorical               continuous
------------------------  -----------------------
Attrition                 Age
BusinessTravel            DailyRate
Department                DistanceFromHome
Education                 EmployeeNumber
EducationField            HourlyRate
EmployeeCount             MonthlyIncome
EnvironmentSatisfaction   MonthlyRate
Gender                    NumCompaniesWorked
JobInvolvement            PercentSalaryHike
JobLevel                  TotalWorkingYears
JobRole                   YearsAtCompany
JobSatisfaction           YearsInCurrentRole
MaritalStatus             YearsSinceLastPromotion
Over18                    YearsWithCurrManager
OverTime
PerformanceRating
RelationshipSatisfaction
StandardHours
StockOptionLevel
TrainingTimesLastYear
WorkLifeBalance


### For Chi-square Test, considering only those categorical columns which are dichotomous in nature

In [9]:
# Displaying only those columns which are dichotomous in nature to perform chi-square test.
for col in df_train.columns:
    if df_train[col].nunique()==2:
        print(col)

Attrition
Gender
OverTime
PerformanceRating


In [10]:
# Encoding the dichotomous variables

df_train["Gender"]=df_train["Gender"].map({"Male":1,"Female":0})
df_train["OverTime"]=df_train["OverTime"].map({"Yes":1,"No":0})
df_train["PerformanceRating"]=df_train["PerformanceRating"].map({4:1,3:0})

attrition_dict = {"Yes":1, "No": 0}
df_train.Attrition=process_attrition(df_train,"Attrition",attrition_dict)

In [11]:
# Checking if all the dichotomous variables are encoded.
df_train.head()

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,Gender,HourlyRate,JobInvolvement,JobLevel,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,MonthlyRate,NumCompaniesWorked,Over18,OverTime,PercentSalaryHike,PerformanceRating,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
1097,24,0,Travel_Rarely,350,Research & Development,21,2,Technical Degree,1,1551,3,1,57,2,1,Laboratory Technician,1,Divorced,2296,10036,0,Y,0,14,0,2,80,3,2,3,3,1,1,0,0
727,18,0,Non-Travel,287,Research & Development,5,2,Life Sciences,1,1012,2,1,73,3,1,Research Scientist,4,Single,1051,13493,1,Y,0,15,0,4,80,0,0,2,3,0,0,0,0
254,29,0,Travel_Rarely,1247,Sales,20,2,Marketing,1,349,4,1,45,3,2,Sales Executive,4,Divorced,6931,10732,2,Y,0,14,0,4,80,1,10,2,3,3,2,0,2
1175,39,0,Travel_Rarely,492,Research & Development,12,3,Medical,1,1654,4,1,66,3,2,Manufacturing Director,2,Married,5295,7693,4,Y,0,21,1,3,80,0,7,3,3,5,4,1,0
1341,31,0,Travel_Rarely,311,Research & Development,20,3,Life Sciences,1,1881,2,1,89,3,2,Laboratory Technician,3,Divorced,4197,18624,1,Y,0,11,0,1,80,1,10,2,3,10,8,0,2


In [12]:
new_df=df_train[["Attrition","Gender","OverTime","PerformanceRating"]]
new_df.head()

Unnamed: 0,Attrition,Gender,OverTime,PerformanceRating
1097,0,1,0,0
727,0,1,0,0
254,0,1,0,0
1175,0,1,0,1
1341,0,1,0,0


In [13]:
x=new_df.iloc[:,1:]
y=new_df.iloc[:,0]
y

1097    0
727     0
254     0
1175    0
1341    0
       ..
1130    0
1294    0
860     1
1459    0
1126    0
Name: Attrition, Length: 1176, dtype: int64

In [14]:
# Performing chi-square test using sklearn library
from sklearn.feature_selection import chi2
score=chi2(x,y)
print(score)
p_value_chi=round(pd.Series(score[1]),2)
p_value_chi.index=x.columns
p_value_chi

(array([6.40744443e-01, 6.58869059e+01, 2.56827553e-02]), array([4.23441352e-01, 4.77553834e-16, 8.72677526e-01]))


Gender               0.42
OverTime             0.00
PerformanceRating    0.87
dtype: float64

In [15]:
# Rejecting or Accepting the null hypothesis by comparing the p-values with the significance value
confidence=0.95
for index,p in p_value_chi.items():
    null = "has no significant effect on attrition"
    alternate= "has significant effect on attrition"
    print(f"Null= {index}",null)
    print(f"Alternate= {index}",alternate)
    print("p_value: ",p)
    if p < (1 - confidence)/2:
        print("Reject null","\n")
    else:
        print("Accept null","\n")

Null= Gender has no significant effect on attrition
Alternate= Gender has significant effect on attrition
p_value:  0.42
Accept null 

Null= OverTime has no significant effect on attrition
Alternate= OverTime has significant effect on attrition
p_value:  0.0
Reject null 

Null= PerformanceRating has no significant effect on attrition
Alternate= PerformanceRating has significant effect on attrition
p_value:  0.87
Accept null 



### Applying Point biserial correlation to check the significance of categorical (Dependent Variable) and continuous (independent) variables

In [16]:
cat_to_cont_hypo(df_train)

Null= Age has no significant effect on attrition
Alternate= Age has significant effect on attrition
r=  -0.15324021484505512 p=  1.2870922401901794e-07
Reject null 

Null= DailyRate has no significant effect on attrition
Alternate= DailyRate has significant effect on attrition
r=  -0.03997057745672086 p=  0.1707509004813214
Accept null 

Null= DistanceFromHome has no significant effect on attrition
Alternate= DistanceFromHome has significant effect on attrition
r=  0.06788628399414463 p=  0.019900153438436433
Reject null 

Null= EmployeeNumber has no significant effect on attrition
Alternate= EmployeeNumber has significant effect on attrition
r=  0.0002504441188528549 p=  0.9931547841861832
Accept null 

Null= HourlyRate has no significant effect on attrition
Alternate= HourlyRate has significant effect on attrition
r=  0.003220655341527304 p=  0.9121490105090345
Accept null 

Null= MonthlyIncome has no significant effect on attrition
Alternate= MonthlyIncome has significant effect on 

In [17]:
# Checking the correlation among all the continuous variables.
df_train.corr(numeric_only=True)

Unnamed: 0,Age,Attrition,DailyRate,DistanceFromHome,Education,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,Gender,HourlyRate,JobInvolvement,JobLevel,JobSatisfaction,MonthlyIncome,MonthlyRate,NumCompaniesWorked,OverTime,PercentSalaryHike,PerformanceRating,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
Age,1.0,-0.15324,0.012482,-0.013938,0.233764,,-0.007333,0.009879,-0.021648,0.032259,0.03659,0.514338,0.000415,0.502111,0.00354,0.317716,0.004569,-0.000342,0.006435,0.057331,,0.038656,0.687766,-0.019172,-0.022859,0.321559,0.231606,0.208265,0.213519
Attrition,-0.15324,1.0,-0.039971,0.067886,-0.022896,,0.00025,-0.080855,0.036962,0.003221,-0.117724,-0.172187,-0.104915,-0.15982,0.019092,0.063013,0.280567,-0.017458,-0.005086,-0.018682,,-0.155555,-0.169475,-0.060597,-0.064892,-0.132936,-0.177569,-0.028593,-0.158191
DailyRate,0.012482,-0.039971,1.0,0.002321,-0.047937,,-0.061066,0.003934,-0.013165,0.024225,0.027444,0.019826,0.030794,0.022182,-0.004193,0.032132,0.010604,0.030379,0.016907,0.017184,,0.037053,0.016138,0.016243,-0.054233,-0.034897,-0.011901,-0.045152,-0.034546
DistanceFromHome,-0.013938,0.067886,0.002321,1.0,0.028617,,0.035405,-0.027306,-0.020873,0.015656,0.03015,0.002972,0.018541,-0.021086,0.055496,-0.035941,0.022064,0.061598,0.049903,0.009796,,0.039541,0.002572,-0.033421,-0.019077,0.008067,0.004831,-0.00709,0.012125
Education,0.233764,-0.022896,-0.047937,0.028617,1.0,,0.052979,-0.024941,-0.026188,0.01057,0.04833,0.094651,-0.000197,0.092736,-0.044339,0.125164,-0.02863,-0.009462,-0.024892,-0.007322,,0.015329,0.162168,-0.027719,0.026776,0.071924,0.073114,0.062309,0.075595
EmployeeCount,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
EmployeeNumber,-0.007333,0.00025,-0.061066,0.035405,0.052979,,1.0,0.010639,0.025056,0.042415,0.007242,-0.016073,-0.035812,-0.015325,0.013708,-0.001839,-0.034283,-0.028039,-0.045237,-0.06968,,0.039407,-0.011732,0.022487,0.019836,-0.009027,-0.012941,-0.013175,-0.006848
EnvironmentSatisfaction,0.009879,-0.080855,0.003934,-0.027306,-0.024941,,0.010639,1.0,-0.031551,-0.047535,-0.019223,0.01562,-0.040371,0.00623,0.051882,0.015172,0.076239,-0.022586,-0.011404,-0.006873,,0.002477,0.014765,-0.019912,0.029365,0.021276,0.024568,0.039448,0.001051
Gender,-0.021648,0.036962,-0.013165,-0.020873,-0.026188,,0.025056,-0.031551,1.0,-0.012475,0.023352,-0.05959,0.011036,-0.058999,-0.04596,-0.036313,-0.033753,0.030831,0.009496,0.03734,,0.007811,-0.051836,-0.03844,-0.001131,-0.045684,-0.063602,-0.031622,-0.035095
HourlyRate,0.032259,0.003221,0.024225,0.015656,0.01057,,0.042415,-0.047535,-0.012475,1.0,0.049996,-0.04253,-0.075768,-0.029799,0.000841,0.011961,-0.014017,0.006329,0.00838,-0.000737,,0.031339,-0.012272,0.004622,-0.007292,-0.038226,-0.027041,-0.046614,-0.034901


In [18]:
df_train.groupby("Attrition").mean(numeric_only=True)

Unnamed: 0_level_0,Age,DailyRate,DistanceFromHome,Education,EmployeeCount,EmployeeNumber,EnvironmentSatisfaction,Gender,HourlyRate,JobInvolvement,JobLevel,JobSatisfaction,MonthlyIncome,MonthlyRate,NumCompaniesWorked,OverTime,PercentSalaryHike,PerformanceRating,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
Attrition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
0,37.408998,806.704499,9.011247,2.906953,1.0,1028.214724,2.735174,0.593047,66.642127,2.759714,2.113497,2.787321,6720.969325,14251.275051,2.578732,0.231084,15.271984,0.156442,2.700409,80.0,0.872188,11.714724,2.831288,2.771984,7.293456,4.496933,2.147239,4.278119
1,33.641414,763.383838,10.489899,2.843434,1.0,1028.621212,2.5,0.641414,66.818182,2.535354,1.606061,2.479798,4710.808081,14613.767677,3.0,0.570707,15.10101,0.151515,2.646465,80.0,0.515152,8.176768,2.621212,2.651515,5.131313,2.782828,1.909091,2.787879


In [19]:
# Dropping all the features from the data which are irrelevant for the model
df_train=df_train.drop(["DailyRate","Education","EmployeeCount","EmployeeNumber","MonthlyRate","PerformanceRating","StandardHours","TrainingTimesLastYear",
               "WorkLifeBalance","HourlyRate","Over18","EducationField","PercentSalaryHike"],axis=1)

In [22]:
# Encoding all the categorical variables using a function.
df_train=label_encode(df_train)

In [23]:
# Seperating the dependent variable from the data.
y_train=df_train.pop("Attrition")
x_train=df_train

## Test data Analysis

<font size=5>Applying everything to Test Data which was done in the Train Dataset.

In [24]:
df_test["Gender"]=df_test["Gender"].map({"Male":1,"Female":0})
df_test["OverTime"]=df_test["OverTime"].map({"Yes":1,"No":0})
df_test["PerformanceRating"]=df_test["PerformanceRating"].map({4:1,3:0})

attrition_dict = {"Yes":1, "No": 0}
df_test.Attrition=process_attrition(df_test,"Attrition",attrition_dict)

In [25]:
df_test=label_encode(df_test)

In [26]:
df_test=df_test.drop(["DailyRate","Education","EmployeeCount","EmployeeNumber","MonthlyRate","PerformanceRating","StandardHours","TrainingTimesLastYear",
               "WorkLifeBalance","HourlyRate","Over18","EducationField","PercentSalaryHike"],axis=1)

In [27]:
df_test.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,EnvironmentSatisfaction,Gender,JobInvolvement,JobLevel,JobRole,JobSatisfaction,MaritalStatus,MonthlyIncome,NumCompaniesWorked,OverTime,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
1041,28,0,2,2,5,4,1,3,2,7,1,2,8463,0,0,4,0,6,5,4,1,3
184,53,0,2,1,13,4,0,4,2,4,1,0,4450,1,0,3,2,5,4,2,1,3
1222,24,1,2,0,22,4,1,1,1,1,3,1,1555,1,0,3,1,1,1,0,0,0
67,45,0,2,1,7,2,1,3,3,6,1,0,9724,2,0,3,1,25,1,0,0,0
220,36,0,2,1,5,4,1,3,2,2,2,2,5914,8,0,4,0,16,13,11,3,7


In [28]:
y_test=df_test.pop("Attrition")
x_test=df_test

## Building a logistic Regression model

In [29]:
# Creating a Logistic Regression model instance
model = LogisticRegression()

# Training the model using the training data
model.fit(x_train, y_train)

# Predicting the target values for the test data
y_test_predict = model.predict(x_test)
print(y_test_predict)

# Calculating the accuracy score of the model on the test data
model_score = model.score(x_test, y_test)
print(model_score)

# Generating and print the confusion matrix for evaluating model performance
print(metrics.confusion_matrix(y_test, y_test_predict))

# Generating and print the classification report to assess precision, recall, and F1-score
print(metrics.classification_report(y_test, y_test_predict))

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
0.8741496598639455
[[253   2]
 [ 35   4]]
              precision    recall  f1-score   support

           0       0.88      0.99      0.93       255
           1       0.67      0.10      0.18        39

    accuracy                           0.87       294
   macro avg       0.77      0.55      0.55       294
weighted avg       0.85      0.87      0.83       294



<font size=5>Conclusion:

1. **Accuracy Score**: The model achieved an accuracy score of approximately 0.874. This indicates that the model correctly classified around 87.4% of instances in the test dataset. While accuracy is a useful measure, it is not sufficient with imbalanced datasets, where one class is much more frequent than the other.

2. **Confusion Matrix**:
   - True Positives (TP): 4
   - True Negatives (TN): 253
   - False Positives (FP): 2
   - False Negatives (FN): 35

   This confusion matrix is providing insights into how well the model performs in classifying different classes. The model seems to be performing well in identifying instances of the majority class (0, no attrition), but it struggles to accurately predict instances of the minority class (1, attrition).

3. **Precision, Recall, and F1-Score**:
   - Precision for class 0: 0.88
   - Precision for class 1: 0.67
   - Recall for class 0: 0.99
   - Recall for class 1: 0.10
   - F1-score for class 0: 0.93
   - F1-score for class 1: 0.18

   Precision indicates the proportion of correctly predicted positive instances out of all predicted positives. Recall represents the proportion of correctly predicted positive instances out of all actual positives. The F1-score balances precision and recall, providing an overall measure of a model's performance.

   The precision and recall values for class 0 are relatively high, indicating that the model is effective at identifying instances where attrition is not present. However, the precision for class 1 is lower, suggesting that there are false positives among the predicted instances of attrition. The recall for class 1 is also quite low, meaning that the model misses a significant number of actual attrition cases.