## Part 1: Preprocessing

In [1]:
# Import our dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras import layers

#  Import and read the attrition data
attrition_df = pd.read_csv('https://static.bc-edx.com/ai/ail-v-1-0/m19/lms/datasets/attrition.csv')
attrition_df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,HourlyRate,JobInvolvement,...,PerformanceRating,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,Sales,1,2,Life Sciences,2,94,3,...,3,1,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,Research & Development,8,1,Life Sciences,3,61,2,...,4,4,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,Research & Development,2,2,Other,4,92,2,...,3,2,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,Research & Development,3,4,Life Sciences,4,56,3,...,3,3,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,Research & Development,2,1,Medical,1,40,3,...,3,4,1,6,3,3,2,2,2,2


In [6]:
# Determine the number of unique values in each column
attrition_df.nunique()

Age                         43
Attrition                    2
BusinessTravel               3
Department                   3
DistanceFromHome            29
Education                    5
EducationField               6
EnvironmentSatisfaction      4
HourlyRate                  71
JobInvolvement               4
JobLevel                     5
JobRole                      9
JobSatisfaction              4
MaritalStatus                3
NumCompaniesWorked          10
OverTime                     2
PercentSalaryHike           15
PerformanceRating            2
RelationshipSatisfaction     4
StockOptionLevel             4
TotalWorkingYears           40
TrainingTimesLastYear        7
WorkLifeBalance              4
YearsAtCompany              37
YearsInCurrentRole          19
YearsSinceLastPromotion     16
YearsWithCurrManager        18
dtype: int64

In [7]:
# Create y_df with the Attrition and Department columns
y_df = attrition_df[['Attrition', 'Department']]


In [8]:
# Create a list of at least 10 column names to use as X data
x_columns = ['Age', 'DistanceFromHome', 'Education', 'EnvironmentSatisfaction', 
             'HourlyRate', 'JobInvolvement', 'JobSatisfaction', 'NumCompaniesWorked', 
             'PercentSalaryHike', 'TotalWorkingYears']


# Create X_df using your selected columns
X_df = attrition_df[x_columns]


# Show the data types for X_df
X_df.dtypes


Age                        int64
DistanceFromHome           int64
Education                  int64
EnvironmentSatisfaction    int64
HourlyRate                 int64
JobInvolvement             int64
JobSatisfaction            int64
NumCompaniesWorked         int64
PercentSalaryHike          int64
TotalWorkingYears          int64
dtype: object

In [9]:
X_df.head(5)

Unnamed: 0,Age,DistanceFromHome,Education,EnvironmentSatisfaction,HourlyRate,JobInvolvement,JobSatisfaction,NumCompaniesWorked,PercentSalaryHike,TotalWorkingYears
0,41,1,2,2,94,3,4,8,11,8
1,49,8,1,3,61,2,2,1,23,10
2,37,2,2,4,92,2,3,6,15,7
3,33,3,4,4,56,3,3,1,11,8
4,27,2,1,1,40,3,2,9,12,6


In [10]:
# Split the data into training and testing sets
from sklearn.model_selection import train_test_split


In [11]:
# Since X_df already contains numeric data types, no additional conversion is necessary.
# However, if there were any non-numeric columns, we could use pd.to_numeric or encoding methods.
# For now, we ensure all data is numeric and handle any potential coercion issues.
X_df = X_df.apply(pd.to_numeric, errors='coerce')
# Convert X_df to numeric data types
# Add new code cells as necessary
# Check for any missing values in X_df after coercion
missing_values = X_df.isnull().sum()
print("Missing values in each column:\n", missing_values)

# Drop rows with missing values, if any
X_df = X_df.dropna()

# Verify that there are no missing values left
print("Missing values after dropping rows:\n", X_df.isnull().sum())


Missing values in each column:
 Age                        0
DistanceFromHome           0
Education                  0
EnvironmentSatisfaction    0
HourlyRate                 0
JobInvolvement             0
JobSatisfaction            0
NumCompaniesWorked         0
PercentSalaryHike          0
TotalWorkingYears          0
dtype: int64
Missing values after dropping rows:
 Age                        0
DistanceFromHome           0
Education                  0
EnvironmentSatisfaction    0
HourlyRate                 0
JobInvolvement             0
JobSatisfaction            0
NumCompaniesWorked         0
PercentSalaryHike          0
TotalWorkingYears          0
dtype: int64


In [12]:
# Create a StandardScaler
scaler = StandardScaler()

# Fit the StandardScaler to the training data
scaler.fit(X_df)


# Scale the training and testing data
X_scaled = scaler.transform(X_df)

X_scaled


array([[ 0.4463504 , -1.01090934, -0.89168825, ...,  2.12513592,
        -1.1505541 , -0.42164246],
       [ 1.32236521, -0.14714972, -1.86842575, ..., -0.67804939,
         2.12930601, -0.1645114 ],
       [ 0.008343  , -0.88751511, -0.89168825, ...,  1.32422583,
        -0.0572674 , -0.55020799],
       ...,
       [-1.08667552, -0.64072665,  0.08504925, ..., -0.67804939,
         1.30934098, -0.67877352],
       [ 1.32236521, -0.88751511,  0.08504925, ..., -0.27759435,
        -0.33058907,  0.7354473 ],
       [-0.32016256, -0.14714972,  0.08504925, ..., -0.27759435,
        -0.87723243, -0.67877352]])

In [13]:
from sklearn.preprocessing import OneHotEncoder

# Create a OneHotEncoder for the Department column
encoder_dept = OneHotEncoder(sparse_output=False)

# Fit the encoder to the Department column in y_df
encoder_dept.fit(y_df[['Department']])

# Transform the Department column and create new variables
department_encoded = encoder_dept.transform(y_df[['Department']])

department_encoded
# Add the encoded columns back to y_df
#y_df = pd.concat([y_df, department_encoded], axis=1)





array([[0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       ...,
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.]])

In [16]:
# Create a OneHotEncoder for the Attrition column
encoder_attr = OneHotEncoder(sparse_output=False)

# Fit the encoder to the training data
encoder_attr.fit(y_df[['Attrition']])

# Transform the Department column and create new variables
attrition_encoded = encoder_attr.transform(y_df[['Attrition']])
attrition_encoded 


array([[0., 1.],
       [1., 0.],
       [0., 1.],
       ...,
       [1., 0.],
       [1., 0.],
       [1., 0.]])

## Part 2: Create, Compile, and Train the Model

In [17]:
X_train_scaled, X_test_scaled, dept_train, dept_test, attr_train, attr_test =train_test_split(X_scaled, department_encoded, attrition_encoded, random_state=25)

#default parameter size is 75 % training  and 25 % testing data  
# Randaom state it is a random sample of 25 % 
X_train_scaled

array([[ 0.55585225,  0.34642721,  0.08504925, ...,  2.12513592,
         1.85598433, -0.55020799],
       [-0.32016256,  1.21018683,  0.08504925, ...,  0.52331574,
        -0.60391075,  0.47831624],
       [-0.10115885, -1.01090934,  0.08504925, ..., -1.07850444,
        -0.0572674 , -0.1645114 ],
       ...,
       [-1.41518107,  1.70376376,  0.08504925, ..., -0.67804939,
        -1.1505541 , -0.67877352],
       [-1.08667552, -0.51733242,  0.08504925, ..., -0.67804939,
        -0.87723243, -0.93590457],
       [ 0.33684855,  0.7166099 ,  0.08504925, ..., -0.67804939,
        -0.60391075,  0.86401283]])

In [18]:
# Find the number of columns in the X training data.
num_cols = X_train_scaled.shape[1] 
#num_cols
# Create the input layer
input_layer = layers.Input(shape=(num_cols,), name='input_features')
# Create at least two shared layers
shared_layer1 = layers.Dense(64, activation='relu')(input_layer)
shared_layer2 = layers.Dense(128, activation='relu')(shared_layer1)
# there are 2 kinds of netwroks sequential and functional ,


In [19]:
# Create a branch for Department
# with a hidden layer and an output layer
# Create the hidden layer
department_layer = layers.Dense(32, activation='relu')(shared_layer2)
# Create the output layer
department_output = layers.Dense(department_encoded.shape[1], activation='sigmoid', name='department_output')(department_layer)

In [20]:
# Create a branch for Attrition
# with a hidden layer and an output layer

# Create the hidden layer
attrition_layer = layers.Dense(32, activation='relu')(shared_layer2)

# Create the output layer
attrition_output = layers.Dense(attrition_encoded.shape[1], activation='sigmoid', name='attrition_output')(attrition_layer)

In [26]:
# Create the model

model = Model(inputs=input_layer, outputs=[department_output, attrition_output])
# Compile the model
model.compile(optimizer='adam',
              loss={'department_output': 'binary_crossentropy', 'attrition_output': 'binary_crossentropy'},
              metrics={'department_output': 'accuracy', 'attrition_output': 'accuracy'})

# Summarize the model

model.summary()

In [27]:
X_train_scaled.shape
dept_train.shape
attr_train.shape #(1102,2)
attrition_encoded.shape #(1470, 2)

(1470, 2)

In [28]:
# Train the model
history = model.fit(
    X_train_scaled,
    {'department_output':dept_train, 'attrition_output': attr_train},
    epochs=100,
    batch_size=32
    #,
    #validation_split=0.2
)

Epoch 1/100
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - attrition_output_accuracy: 0.8171 - attrition_output_loss: 0.5610 - department_output_accuracy: 0.5474 - department_output_loss: 0.6027 - loss: 1.1637
Epoch 2/100
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.8305 - attrition_output_loss: 0.4495 - department_output_accuracy: 0.6585 - department_output_loss: 0.4822 - loss: 0.9318 
Epoch 3/100
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.8432 - attrition_output_loss: 0.4054 - department_output_accuracy: 0.6651 - department_output_loss: 0.4722 - loss: 0.8774 
Epoch 4/100
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.8264 - attrition_output_loss: 0.4268 - department_output_accuracy: 0.6553 - department_output_loss: 0.4765 - loss: 0.9032 
Epoch 5/100
[1m35/35[0m [32m━━━━━━━━━━━━━━

In [29]:
# Evaluate the model with the testing data
test_results = model.evaluate(X_test_scaled,{'department_output': dept_test, 'attrition_output': attr_test})
test_results


[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - attrition_output_accuracy: 0.7560 - attrition_output_loss: 1.9074 - department_output_accuracy: 0.5312 - department_output_loss: 1.7225 - loss: 3.6268  


[3.4518697261810303,
 1.6395297050476074,
 1.8328219652175903,
 0.7771739363670349,
 0.5380434989929199]

In [30]:
# Print the accuracy for both department and attrition
print(f"Department Accuracy: {test_results[3]}")
print(f"Attrition Accuracy: {test_results[4]}")


Department Accuracy: 0.7771739363670349
Attrition Accuracy: 0.5380434989929199


# Summary

In the provided space below, briefly answer the following questions.

1. Is accuracy the best metric to use on this data? Why or why not?

2. What activation functions did you choose for your output layers, and why?

3. Can you name a few ways that this model might be improved?

YOUR ANSWERS HERE

1. 
Accuracy is a common metric, but it's not always the best choice. If the dataset has an imbalance, where one class significantly outnumbers the other, accuracy might be misleading. In such cases, metrics like precision, recall, or F1-score can provide a more insightful assessment of the model's performance
2. 
I have choosen Sigmoid function  activation since  we are applying for   binary classification problem (predicting one of two classes). It outputs a value between 0 and 1, which can be interpreted as a probability. if in case we are working on  multi-class classification (predicting one of multiple classes), the Softmax function is often used. It outputs a probability distribution over all classes, making it easy to determine the most likely class. If the task is regression (predicting a continuous value), a linear activation function might be appropriate. 

3. To improve a model predicting employee attrition, you can focus on enhancing the data used for training, improving the model's architecture, and adjusting the model's parameters. Specifically, you can gather more diverse and comprehensive data, explore more complex models, and tune hyperparameters to optimize performance.

Diverse and comprehensive data - 
        Collect more data:
        A larger dataset, especially one representative of diverse employee demographics and roles, can help the model generalize better and avoid bias.
        Clean and preprocess data:
        Ensure data accuracy, handle missing values appropriately, and transform data into a suitable format for the model.
        Include relevant factors:
        Gather data on factors that might be related to attrition, such as performance, work-life balance, employee engagement, and career growth opportunities.
        Consider time-series data:
        If possible, incorporate historical data to capture trends and patterns in employee attrition. 

explore more complex models -
        Experiment with various machine learning models, such as decision trees, random forests, support vector machines, and neural networks, to see which performs best on the data.
        

Hyperparameter tuning - Try different numbers of layers, units, learning rates (e.g. with Keras Tuner)

