## Part 1: Preprocessing

In [1]:
# Import our dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras import layers

from sklearn.preprocessing import LabelEncoder, OneHotEncoder


#  Import and read the attrition data
attrition_df = pd.read_csv('https://static.bc-edx.com/ai/ail-v-1-0/m19/lms/datasets/attrition.csv')
attrition_df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,HourlyRate,JobInvolvement,...,PerformanceRating,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,Sales,1,2,Life Sciences,2,94,3,...,3,1,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,Research & Development,8,1,Life Sciences,3,61,2,...,4,4,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,Research & Development,2,2,Other,4,92,2,...,3,2,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,Research & Development,3,4,Life Sciences,4,56,3,...,3,3,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,Research & Development,2,1,Medical,1,40,3,...,3,4,1,6,3,3,2,2,2,2


In [2]:
# Determine the number of unique values in each column
display(attrition_df.nunique())

# [Optional] Attition and Department value counts:
# display(attrition_df["Attrition"].value_counts())
# display(attrition_df["Department"].value_counts())
# [Optional] List columns
# display(attrition_df.columns)

Age                         43
Attrition                    2
BusinessTravel               3
Department                   3
DistanceFromHome            29
Education                    5
EducationField               6
EnvironmentSatisfaction      4
HourlyRate                  71
JobInvolvement               4
JobLevel                     5
JobRole                      9
JobSatisfaction              4
MaritalStatus                3
NumCompaniesWorked          10
OverTime                     2
PercentSalaryHike           15
PerformanceRating            2
RelationshipSatisfaction     4
StockOptionLevel             4
TotalWorkingYears           40
TrainingTimesLastYear        7
WorkLifeBalance              4
YearsAtCompany              37
YearsInCurrentRole          19
YearsSinceLastPromotion     16
YearsWithCurrManager        18
dtype: int64

In [3]:
# Create y_df (as a copy of attrition_df) with the Attrition and Department columns
y_df = attrition_df[["Attrition", "Department"]].copy()

In [4]:
# Create a list of at least 10 column names to use as X data
# NOTE: I am using 12 columns
column_names = [x_column for x_column in attrition_df.columns if x_column not in [
    "Attrition", "Department", "BusinessTravel", "Education",  
    "MaritalStatus", "PercentSalaryHike", "EducationField", "JobRole",
    "PerformanceRating", "RelationshipSatisfaction", "StockOptionLevel", 
    "TotalWorkingYears", "TrainingTimesLastYear", "YearsInCurrentRole", 
    "YearsWithCurrManager", "YearsAtCompany", "YearsSinceLastPromotion"]]

# Create X_df as copy of attrition_df, using your selected columns
X_df = attrition_df[column_names].copy()
# display(X_df)

# Show the data types for X_df
display(X_df.dtypes)

Age                         int64
DistanceFromHome            int64
EnvironmentSatisfaction     int64
HourlyRate                  int64
JobInvolvement              int64
JobLevel                    int64
JobSatisfaction             int64
NumCompaniesWorked          int64
OverTime                   object
WorkLifeBalance             int64
dtype: object

## Perform Label Encoding For binary labels Feature Encoding on Nominal 
### Fields since it is binary no risk of false ordinal relationships
#### X_df -> OverTime, y_df -> Attrition

In [5]:

# [Optional] Encode X_df and y_df fields
# print("Checking columns in X_df before splitting:")
# print(X_df.columns)

# Step 1: Label Encode `OverTime` (Yes=1, No=0)
# Initialize LabelEncoder
overtime_le = LabelEncoder()
# display(X_df["OverTime"])
# Perform Fit Transform
X_df["OverTime"] = overtime_le.fit_transform(X_df["OverTime"])

# Step 2: LabelEncoder 'Attrition'  (Yes=1, No=0)
attrition_le = LabelEncoder()

# Perform Fit Transform
y_df["Attrition"] = attrition_le.fit_transform(y_df["Attrition"])

# Step 3: Check if y_df['Attrition'] and X_df['OverTime'] is encoded
display(X_df.head())
display(y_df.head())



Unnamed: 0,Age,DistanceFromHome,EnvironmentSatisfaction,HourlyRate,JobInvolvement,JobLevel,JobSatisfaction,NumCompaniesWorked,OverTime,WorkLifeBalance
0,41,1,2,94,3,2,4,8,1,1
1,49,8,3,61,2,2,2,1,0,3
2,37,2,4,92,2,1,3,6,1,3
3,33,3,4,56,3,1,3,1,1,3
4,27,2,1,40,3,1,2,9,0,3


Unnamed: 0,Attrition,Department
0,1,Sales
1,0,Research & Development
2,1,Research & Development
3,0,Research & Development
4,0,Research & Development


# Perform Test Train Split

In [6]:
# Split the data into training and testing sets
from sklearn.model_selection import train_test_split

# Train-Test Split (Before OneHotEncoding!)
X_train, X_test, y_train, y_test = train_test_split(X_df, y_df, test_size=0.2, random_state=42)

print("🚀 Checking columns in X_train after splitting:")
print(X_train.columns)



🚀 Checking columns in X_train after splitting:
Index(['Age', 'DistanceFromHome', 'EnvironmentSatisfaction', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobSatisfaction', 'NumCompaniesWorked',
       'OverTime', 'WorkLifeBalance'],
      dtype='object')


In [7]:
# [Optiona] Make sure X_test and X_train variables are numeric
print(X_test.dtypes)
print()
print(X_train.dtypes)

Age                        int64
DistanceFromHome           int64
EnvironmentSatisfaction    int64
HourlyRate                 int64
JobInvolvement             int64
JobLevel                   int64
JobSatisfaction            int64
NumCompaniesWorked         int64
OverTime                   int64
WorkLifeBalance            int64
dtype: object

Age                        int64
DistanceFromHome           int64
EnvironmentSatisfaction    int64
HourlyRate                 int64
JobInvolvement             int64
JobLevel                   int64
JobSatisfaction            int64
NumCompaniesWorked         int64
OverTime                   int64
WorkLifeBalance            int64
dtype: object


In [8]:
# Create a StandardScaler
sc = StandardScaler()

# Scale the training and testing data
X_train_scaled = sc.fit_transform(X_train)
X_test_scaled = sc.transform(X_test)

# Perform OneHotEncoding on 
# y_df['Department']

In [9]:
from sklearn.preprocessing import OneHotEncoder

# Step 1: Create a OneHotEncoder for the Department column
# NOTE: Since train_test_split has been performed with y_df, y_test and y_train
# already contain the Department variable.  Dropping it know will not change anything
department_ohe = OneHotEncoder(sparse_output=False, handle_unknown="ignore" )

# Convert your y data to numeric data types however you see fit
# Step 2: Fit the encoder to the training data
# Step 3: Create tow new variables by applying the encoder
# to the training and testing data
y_train_department_encoded = department_ohe.fit_transform(y_train[["Department"]])
y_test_department_encoded = department_ohe.transform(y_test[["Department"]])

# Step 4: Check if y_train and y_test Department are Encoded
display(y_train.head())
display(y_test.head())


Unnamed: 0,Attrition,Department
1097,0,Research & Development
727,0,Research & Development
254,0,Sales
1175,0,Research & Development
1341,0,Research & Development


Unnamed: 0,Attrition,Department
1041,0,Sales
184,0,Research & Development
1222,1,Human Resources
67,0,Research & Development
220,0,Research & Development


In [10]:
# NOTE: Label Encoder was used on Attrition column
# before Train Test Split, as it is a binary classification
# So this step is Not Applicable

# Create a OneHotEncoder for the Attrition column
# N/A

# Fit the encoder to the training data
# N/A

# Create two new variables by applying the encoder
# to the training and testing data

#N/A

## Part 2: Create, Compile, and Train the Model

In [11]:
# Find the number of columns in the X training data.
X_train_columns = X_train_scaled.shape[1]
display(X_train_columns)


# Create the input layer with the number of columns
input_layer = layers.Input(shape=(X_train_scaled.shape[1],), name="input_features")

# Create at least two shared layers
shared_layer1 = layers.Dense(64, activation="relu")(input_layer)
shared_backbone_layer = layers.Dense(64, activation="relu")(shared_layer1)

10

In [12]:
# Create a branch for Department
# with a hidden layer and an output layer

# Create the hidden layer
department_dense = layers.Dense(32, activation="relu")(shared_backbone_layer)

# Create the output layer (Departments are MultiClass and Mutually Exclusive -> Softmax)

# Note: y_train_department_encoded contains the one-hot encoded department classes

# IMPORTANT: THE NAME OF YOUR LAYER MUST MATCH THE NAME IN MODEL.FIT
# Since Department is OneHotEncoded, its still a Mutually Exclusive Binary Class prob -> Sigmoid)
department_output = layers.Dense(y_train_department_encoded.shape[1],
                                #  activation='softmax',
                                activation='sigmoid',
                                 name='department_output')(department_dense)

In [13]:
# Create a branch for Attrition
# with a hidden layer and an output layer

# Create the hidden layer
attrition_dense = layers.Dense(32, activation="relu")(shared_backbone_layer)

# Create the output layer
# NOTE: Since Binary Attrition is a Binary Classification just use 1 neuron 
# instead of len(y_train['Attrition'].columns
# IMPORTANT: THE NAME OF YOUR LAYER MUST MATCH THE NAME IN MODEL.fit
attrition_output = layers.Dense(1,
                                 activation='sigmoid',
                                 name='attrition_output')(attrition_dense)

In [14]:
# Create the model
model = Model(inputs=input_layer, outputs={
    'department_output':department_output,
    'attrition_output': attrition_output
})

# Compile the model
model.compile(optimizer='adam',
              loss={'department_output': 'categorical_crossentropy',
                    'attrition_output': 'binary_crossentropy'},
              # Identify the NN outputs and grade it
              metrics={'department_output': 'accuracy',
                       'attrition_output': 'accuracy'})

# Summarize the model
model.summary()

In [15]:
# Train the model

# 🚀 Step 1: Ensure X_train_scaled is a NumPy array
X_train_scaled_np = np.array(X_train_scaled)

# 🚀 Step 2: Ensure y_train encoded department data is a numpy array
y_train_department_np = np.array(y_train_department_encoded) # One-hot encoded labels

# IMPORTANT: Make Sure the y_train Attrition Column is a 2D array not 1D
y_train_attrition_np = np.array(y_train["Attrition"]).reshape(-1, 1)

y_train_dict = {
    "department_output": y_train_department_np,
    "attrition_output": y_train_attrition_np  
}

# Step 3: Train the model
model.fit(
    X_train_scaled_np,
    y_train_dict,
    # You can adjust the number of epochs based on your needs, try (10-20)
    epochs=10,  
    # epochs=11,  

    batch_size=32,  # You can adjust the batch size based on your available memory
    validation_split=0.2  # You can specify the validation split if you have a separate validation set
)

Epoch 1/10


2025-03-09 16:27:23.379857: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] PluggableGraphOptimizer failed: INVALID_ARGUMENT: Failed to deserialize the `graph_buf`.


[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 41ms/step - attrition_output_accuracy: 0.7332 - attrition_output_loss: 0.5917 - department_output_accuracy: 0.5705 - department_output_loss: 0.9765 - loss: 1.5692 - val_attrition_output_accuracy: 0.7966 - val_attrition_output_loss: 0.4838 - val_department_output_accuracy: 0.6314 - val_department_output_loss: 0.8262 - val_loss: 1.3375
Epoch 2/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 25ms/step - attrition_output_accuracy: 0.8424 - attrition_output_loss: 0.4314 - department_output_accuracy: 0.6732 - department_output_loss: 0.7530 - loss: 1.1842 - val_attrition_output_accuracy: 0.7966 - val_attrition_output_loss: 0.4442 - val_department_output_accuracy: 0.6186 - val_department_output_loss: 0.7980 - val_loss: 1.2735
Epoch 3/10
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 25ms/step - attrition_output_accuracy: 0.8336 - attrition_output_loss: 0.4151 - department_output_accuracy: 0.6606

<keras.src.callbacks.history.History at 0x31770f160>

In [16]:
# Evaluate the model with the testing data
# Step 1: Convert X_test_scaled to numpy array
X_test_np = np.array(X_test_scaled)

# Step 2: Split y_test Attrition and Department data into 
# separate NumPy arrays for each output 
y_test_department_np = np.array(y_test_department_encoded)

# IMPORTANT: Make Sure the Columns is a 2D array not 1D
# converts it to a 2D array (shape: (n_samples, 1))
y_test_attrition_np = np.array(y_test["Attrition"]).reshape(-1, 1)

# Step 3: Combine y_test Attrition and Deparment 
# NumPy arrays into a dictionary:
y_test_dict = {
    'department_output': y_test_department_np, # Multi-class labels
    'attrition_output': y_test_attrition_np # Binary labels
}
results = model.evaluate(X_test_np, y_test_dict)



[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - attrition_output_accuracy: 0.8728 - attrition_output_loss: 0.3589 - department_output_accuracy: 0.6224 - department_output_loss: 0.8203 - loss: 1.1837


In [17]:
# [Optional] Get Model Predictions and Convert to Human Readable Text

# Step 1: Get predictions ( Just Pass in one row to get return a prediction for that row)
predictions = model.predict(X_test_np)
# print("Raw Predictions:", predictions)

# Step 2: Get Initial value before Encoding
department_ytest_mapping = department_ohe.inverse_transform(np.round(y_test_department_np))
attrition_ytest_mapping = attrition_le.inverse_transform(np.round(y_test_attrition_np))

# Step 3: Pull out Dependent Variable Predictions
department_pred_mapping = predictions["department_output"]
attrition_pred_mapping = predictions["attrition_output"]

# Step 4: Convert Predicted Data into the correct Numeric Format and Dimensions

# Flatten the department_pred_mapping into a one dimensional array, as it was originally
department_pred = department_ohe.inverse_transform(department_pred_mapping).ravel()

# Convert attrition_pred_mapping from probability to discrete data (0 or 1)
attrition_pred_labels = (attrition_pred_mapping > 0.5).astype(int)
attrition_pred = attrition_le.inverse_transform(attrition_pred_labels)

print("Human Readable Predictions")
print("DepartmentMap:", department_ytest_mapping[:5])
print("AttritionMap:", attrition_ytest_mapping[:5])

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step
Human Readable Predictions
DepartmentMap: [['Sales']
 ['Research & Development']
 ['Human Resources']
 ['Research & Development']
 ['Research & Development']]
AttritionMap: ['No' 'No' 'Yes' 'No' 'No']


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


In [18]:
# Print the accuracy for both department and attrition
print("Results:", results)

department_accuracy = results[4] 
attrition_accuracy = results[3] 

# 🚀 Print the accuracy scores
print(f"🎯 Department Classification Accuracy: {department_accuracy:.4f}")
print(f"🎯 Attrition Classification Accuracy: {attrition_accuracy:.4f}")

Results: [1.1341572999954224, 0.7802050113677979, 0.3293530344963074, 0.8877550959587097, 0.6496598720550537]
🎯 Department Classification Accuracy: 0.6497
🎯 Attrition Classification Accuracy: 0.8878


In [19]:
# [Optional] Add the Predicted Results from model.predict and 
# add the y_test Attrition and Department to compare model performance

# Convert Actual Attrition Values from binary to YES and NO
X_test["Attrition Actual"] = y_test["Attrition"].map({1: "Yes", 0: "No"})
X_test["Department Actual"] = y_test["Department"]
X_test["Attrition Prediction"] = attrition_pred
X_test["Department Prediction"] = department_pred

# ✅ Removes old index, sets a new one
X_test = X_test.reset_index(drop=True)  
# display(X_test.head())
# display(X_test.tail())
display(y_df["Attrition"].value_counts())
display(y_df["Department"].value_counts())

# Run Classification Report:
from sklearn.metrics import classification_report

# Generate classification report for Attrition
report_attrition = classification_report(X_test["Attrition Actual"], X_test["Attrition Prediction"])

# Generate classification report for Department
report_department = classification_report(X_test["Department Actual"], X_test["Department Prediction"])

# Print both reports
print("Attrition Classification Report:\n", report_attrition)
print("\nDepartment Classification Report:\n", report_department)

Attrition
0    1233
1     237
Name: count, dtype: int64

Department
Research & Development    961
Sales                     446
Human Resources            63
Name: count, dtype: int64

Attrition Classification Report:
               precision    recall  f1-score   support

          No       0.89      0.99      0.94       255
         Yes       0.75      0.23      0.35        39

    accuracy                           0.89       294
   macro avg       0.82      0.61      0.65       294
weighted avg       0.87      0.89      0.86       294


Department Classification Report:
                         precision    recall  f1-score   support

       Human Resources       0.00      0.00      0.00        13
Research & Development       0.69      0.88      0.77       196
                 Sales       0.41      0.22      0.29        85

              accuracy                           0.65       294
             macro avg       0.37      0.37      0.35       294
          weighted avg       0.58      0.65      0.60       294



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# Summary

In the provided space below, briefly answer the following questions.

1. Is accuracy the best metric to use on this data? Why or why not?

2. What activation functions did you choose for your output layers, and why?

3. Can you name a few ways that this model might be improved?

YOUR ANSWERS HERE

1. No accuracy scores are not the best metrics, as it does not take class imbalance into consideration.  For instance, the majority of employees will remain at a company, thus employees that leave the company will be the minority group in most cases.  The data supports this, as the "No" attrition class (employees who stay) is more than 5 times larger than the "Yes" class (employees who leave). As a result, when the model is trained on this imbalanced data, it may overfit to the majority class, leading to misleading accuracy scores. The classification report further confirms this issue: the recall for the “No” class is 99%, meaning the model correctly identifies almost all employees who stay. However, the recall for the “Yes” class is only 23%, indicating that the model struggles to correctly identify employees who leave.

    - Because of this imbalance, precision, recall, and F1-scores from the classification report would be a better metric to determine the accuracy of the model.

2. Predictions:
    - Attrition: A sigmoid activation function was used in the attrition output layer because attrition is a binary classification problem (e.g., “Yes” or “No”). The sigmoid function outputs probabilities between 0 and 1, making it suitable for binary classification, as it allows the model to predict the probability of an employee leaving the company.

    - Department: Initially, a softmax activation function was chosen since department prediction was treated as a mutually exclusive multi-class classification problem. Softmax ensures that the sum of all output probabilities equals 1, allowing the model to assign the highest probability to a single department.

    - However, the activation function was later changed to sigmoid because the Department field was transformed using OneHotEncoder, splitting it into three separate columns (one for each department). In this case, each output neuron acts independently, making sigmoid a valid choice, as it enables the model to predict multiple departments at once. Although the output is still 0 or 1, it represents the probability of belonging to each department independently, rather than choosing just one department as softmax would.

3. The model could potentially be improved by reducing the number of neurons in the backbone_layer before branching. Initially, I used 128 neurons, which seemed to cause overfitting. The model showed slight improvement when reducing the number of neurons to 64.

    - Additionally, adjusting the number of training epochs could further improve performance. While increasing the epochs slightly may help, doubling the epochs from 10 to 20 actually reduced accuracy. However, increasing the number of epochs to 11 led to a slight improvement in recall for the Department classes.