# Problem Statement:

This data set includes customers who have paid off their loans, who have been past due and put into collection without paying back their loan and interests, and who have paid off only after they were put in the collection. The financial product is a bullet loan that customers should pay off all of their loan debt in just one time by the end of the term, instead of an installment schedule. Of course, they could pay off earlier than their pay schedule.

### Attribute information:

- Loan_status: Whether a loan is paid off, in the collection, new customer yet to pay off, or paid off after the collection efforts

- Principal: Basic principal loan amount at the origination

- terms: Can be weekly (7 days), biweekly, and monthly payoff schedule

- Age, education, gender: A customer’s basic demographic information

### Importing the required libraries 

In [1]:
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# libaries to help with data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Library to encode the variables
from sklearn.preprocessing import OneHotEncoder
 # Library to split data
from sklearn.model_selection import train_test_split  
# library to import different optimizers
from tensorflow.keras import optimizers
# Library to import different loss functions 
from tensorflow.keras import losses
from tensorflow.keras.layers import Dense
# Library to avoid the warnings 
import warnings
warnings.filterwarnings('ignore')
# importing keras library
from tensorflow import keras
# library to convert the target variables to numpy arrays
from tensorflow.keras.utils import to_categorical
# library to plot classification report
from sklearn.metrics import classification_report

### Q Import the dataset and answer the below question. 
### Is the given problem statement a binary classification problem, and how many unique values are present in the target(loan_status) column? 
- Yes,2
- No, 5
- Yes,4
- No, 3

In [2]:
data = pd.read_csv('Loan_payments_data.csv')

In [3]:
data.loan_status.nunique()

3

### Correct Answer:  No, 3 

**Explanation** Since the given problem statement has three classes to predict, this will be a Multi-class classification problem. If there are two classes to predict, it will be a binary classification problem. 

### Q Build a Neural Network model on the dataset and Obtain accuracy by following the below steps:

- Store the Independent and Dependent features in X and y
- Use train_test split to split the data (80% for training and 20% for testing)
- Convert the target feature into a numpy array using Keras to_categorical function

Use the below parameters mentioned. 
- Number of neurons in First and second layers are 64 and 32 respectively.
- Use ReLu as an Activation function in Hidden layers and Adam as Optimizer with 1e-3 as learning rate
- Built the model on 50 Epochs


- **Note** - Do not use stratify sampling and Callbacks.


- `>`80 and `<`90 
- `>`90 and `<`95 
- `>`95 
- `<`70

In [4]:
#Store the Independent and Dependent features in X and y
X = data.drop('loan_status',axis=1)
Y = data[['loan_status']]

In [5]:
# Use train_test split to split the data (80% for training and 20% for testing)
X_train, X_test, y_train, y_test=train_test_split(X, Y, test_size=0.2, random_state=1)

In [6]:
# Convert the target feature into a NumPy array using Keras to_categorical function
y_train = to_categorical(y_train, 3)
y_test_cat = to_categorical(y_test, 3)

In [7]:
# Model Building
#Defining the model
model = keras.Sequential()
# Adding the input layer with 64 neurons with relu as an activation function with input shape 11
model.add(Dense(64, activation='relu',input_shape=(11,))) 
# Adding the first hidden layer with 32 neurons with relu as an activation function
model.add(Dense(32, activation='relu'))
# Defining the output layer with 3 neurons with softmax as an activation function
model.add(Dense(3, activation='softmax'))
# Defining the Adam Optimizers
adam = optimizers.Adam(lr=1e-3)
# Compiling the model with categorical crossentropy as loss function with accuracy as metrics
model.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy']) 
# Fitting the model on X_train and y_train with 50 epcohs with 20% of validation split
history=model.fit(X_train, y_train, epochs=50,  validation_split=0.2,  verbose=2)

Epoch 1/50
10/10 - 1s - loss: 48.4591 - accuracy: 0.2156 - val_loss: 8.2859 - val_accuracy: 0.6000 - 558ms/epoch - 56ms/step
Epoch 2/50
10/10 - 0s - loss: 14.5368 - accuracy: 0.6250 - val_loss: 14.0550 - val_accuracy: 0.6125 - 27ms/epoch - 3ms/step
Epoch 3/50
10/10 - 0s - loss: 7.2464 - accuracy: 0.3406 - val_loss: 3.1898 - val_accuracy: 0.4375 - 34ms/epoch - 3ms/step
Epoch 4/50
10/10 - 0s - loss: 3.9233 - accuracy: 0.5594 - val_loss: 2.0731 - val_accuracy: 0.4375 - 33ms/epoch - 3ms/step
Epoch 5/50
10/10 - 0s - loss: 2.4961 - accuracy: 0.4969 - val_loss: 2.6696 - val_accuracy: 0.1125 - 34ms/epoch - 3ms/step
Epoch 6/50
10/10 - 0s - loss: 1.9635 - accuracy: 0.4406 - val_loss: 2.6827 - val_accuracy: 0.2000 - 35ms/epoch - 4ms/step
Epoch 7/50
10/10 - 0s - loss: 1.6888 - accuracy: 0.4563 - val_loss: 2.0999 - val_accuracy: 0.4375 - 34ms/epoch - 3ms/step
Epoch 8/50
10/10 - 0s - loss: 1.5999 - accuracy: 0.4938 - val_loss: 1.7981 - val_accuracy: 0.5500 - 35ms/epoch - 4ms/step
Epoch 9/50
10/10 - 

### Correct Answer: **<70**

### Q For the above model built, find the f1 - score for the 0th class using the classification report.

- 0.51 - 0.55
- 0.71 - 0.80
- 0.60 - 0.70
- 0.35 - 0.50

In [8]:
# Predicting on Test data 
y_pred=model.predict(X_test)

In [9]:
# Appling argmax function
y_pred_final=[]
for i in y_pred:
    y_pred_final.append(np.argmax(i))

In [10]:
# Classification report 
print(classification_report(y_test,y_pred_final))

              precision    recall  f1-score   support

           0       0.51      1.00      0.68        51
           1       0.00      0.00      0.00        23
           2       0.00      0.00      0.00        26

    accuracy                           0.51       100
   macro avg       0.17      0.33      0.23       100
weighted avg       0.26      0.51      0.34       100



### Correct Answer: 0.60 - 0.70

### Q Build a model on the data using below hyperparameters and find the accuracy. 

- The number of neurons in First,second, thrid and fourth layers should be 128,64,64 and 32 respectively.
- Use ReLu as an Activation function in Hidden layers and Adam as Optimizer with 1e-3 as learning rate
- Built the model on 200 Epochs

**Note** - Do not use stratify sampling and Callbacks.

- `>`80 and `<`90
- `>`90 and `<`95
- `>`95
- `<`75

In [11]:
# Model Building
#Defining the model
model_1 = keras.Sequential()
# Adding the input layer with 128 neurons with relu as an activation function input shape 11
model_1.add(Dense(128, activation='relu',kernel_initializer='he_uniform',input_shape=(11,))) 
# Adding the first hidden layer with 64 neurons with relu as an activation function
model_1.add(Dense(64, activation='relu',kernel_initializer='he_uniform'))
# Adding the second hidden layer with 64 neurons with relu as an activation function
model_1.add(Dense(64, activation='relu',kernel_initializer='he_uniform'))
# Adding the third hidden layer with 32 neurons with relu as an activation function
model_1.add(Dense(32, activation='relu',kernel_initializer='he_uniform'))
# Defining the output layer with 3 neurons with softmax as an activation function
model_1.add(Dense(3, activation='softmax'))
# Defining the Adam Optimizers
adam = optimizers.Adam(lr=1e-3)
# Compiling the model with categorical crossentropy as loss function with accuracy as metrics
model_1.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy']) 
# Fitting the model on X_train and y_train with 200 epcohs having validation split of 20% 
history_1=model_1.fit(X_train, y_train, validation_split=0.2, epochs=200, batch_size=128, verbose=2)

Epoch 1/200
3/3 - 0s - loss: 447.5361 - accuracy: 0.2031 - val_loss: 286.5123 - val_accuracy: 0.2375 - 363ms/epoch - 121ms/step
Epoch 2/200
3/3 - 0s - loss: 249.8191 - accuracy: 0.1719 - val_loss: 88.4429 - val_accuracy: 0.1500 - 26ms/epoch - 9ms/step
Epoch 3/200
3/3 - 0s - loss: 57.0634 - accuracy: 0.3063 - val_loss: 56.8718 - val_accuracy: 0.6125 - 25ms/epoch - 8ms/step
Epoch 4/200
3/3 - 0s - loss: 57.0386 - accuracy: 0.6250 - val_loss: 86.3496 - val_accuracy: 0.6125 - 25ms/epoch - 8ms/step
Epoch 5/200
3/3 - 0s - loss: 72.2492 - accuracy: 0.6250 - val_loss: 77.2203 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 6/200
3/3 - 0s - loss: 56.9187 - accuracy: 0.6250 - val_loss: 39.6999 - val_accuracy: 0.6125 - 22ms/epoch - 7ms/step
Epoch 7/200
3/3 - 0s - loss: 24.4674 - accuracy: 0.5000 - val_loss: 28.4140 - val_accuracy: 0.1500 - 25ms/epoch - 8ms/step
Epoch 8/200
3/3 - 0s - loss: 24.3799 - accuracy: 0.1656 - val_loss: 10.3069 - val_accuracy: 0.2500 - 23ms/epoch - 8ms/step
Epoch 9/20

Epoch 68/200
3/3 - 0s - loss: 3.8316 - accuracy: 0.6250 - val_loss: 3.3532 - val_accuracy: 0.2250 - 24ms/epoch - 8ms/step
Epoch 69/200
3/3 - 0s - loss: 3.4130 - accuracy: 0.3000 - val_loss: 6.6318 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 70/200
3/3 - 0s - loss: 5.1016 - accuracy: 0.5469 - val_loss: 2.3021 - val_accuracy: 0.1625 - 24ms/epoch - 8ms/step
Epoch 71/200
3/3 - 0s - loss: 3.3622 - accuracy: 0.3313 - val_loss: 5.8179 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 72/200
3/3 - 0s - loss: 5.7506 - accuracy: 0.5344 - val_loss: 2.9344 - val_accuracy: 0.6125 - 16ms/epoch - 5ms/step
Epoch 73/200
3/3 - 0s - loss: 3.3594 - accuracy: 0.5594 - val_loss: 3.8875 - val_accuracy: 0.4625 - 16ms/epoch - 5ms/step
Epoch 74/200
3/3 - 0s - loss: 3.0795 - accuracy: 0.5156 - val_loss: 2.4102 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 75/200
3/3 - 0s - loss: 2.6701 - accuracy: 0.5437 - val_loss: 3.3521 - val_accuracy: 0.1625 - 24ms/epoch - 8ms/step
Epoch 76/200
3/3 - 0s - 

Epoch 135/200
3/3 - 0s - loss: 3.8796 - accuracy: 0.5094 - val_loss: 4.8222 - val_accuracy: 0.6125 - 26ms/epoch - 9ms/step
Epoch 136/200
3/3 - 0s - loss: 4.0112 - accuracy: 0.5531 - val_loss: 3.5935 - val_accuracy: 0.1500 - 24ms/epoch - 8ms/step
Epoch 137/200
3/3 - 0s - loss: 3.1680 - accuracy: 0.4625 - val_loss: 3.6013 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 138/200
3/3 - 0s - loss: 2.8119 - accuracy: 0.3875 - val_loss: 3.2612 - val_accuracy: 0.6125 - 26ms/epoch - 9ms/step
Epoch 139/200
3/3 - 0s - loss: 3.3397 - accuracy: 0.6250 - val_loss: 1.6082 - val_accuracy: 0.2750 - 24ms/epoch - 8ms/step
Epoch 140/200
3/3 - 0s - loss: 1.8822 - accuracy: 0.2656 - val_loss: 1.4704 - val_accuracy: 0.5875 - 24ms/epoch - 8ms/step
Epoch 141/200
3/3 - 0s - loss: 2.2479 - accuracy: 0.4781 - val_loss: 3.8803 - val_accuracy: 0.1875 - 27ms/epoch - 9ms/step
Epoch 142/200
3/3 - 0s - loss: 2.9709 - accuracy: 0.3531 - val_loss: 1.4814 - val_accuracy: 0.5875 - 26ms/epoch - 9ms/step
Epoch 143/200
3/

### Correct Answer: <75 

### Q Build a model on the data using below hyperparameters and find the precision of 0th class using the classification report.

* The number of neurons in First,and second layer should be 64 and 32 respectively.
* Use ReLu as an Activation function in Hidden layers and SGD as Optimizer with 1e-3 as learning rate
* Built the model on 100 Epochs

**Note** - Do not use stratify sampling and Callbacks.

- `>`80 and `<`90
- `>`60 and `<`70
- `>`95
- `<`60

In [12]:
# Model Building
#Defining the model
model_2 = keras.Sequential()
# Adding the input layer with 64 neurons with relu as an activation function of input shape 11
model_2.add(Dense(64, activation='relu',kernel_initializer='he_uniform',input_shape=(11,)))
# Adding the first hidden layer with 32 neurons with relu as an activation function
model_2.add(Dense(32, activation='relu',kernel_initializer='he_uniform'))
# Defining the output layer with 3 neurons with softmax as an activation function
model_2.add(Dense(3, activation='softmax'))
# Deining the SGD optimizer
SGD = optimizers.SGD(lr=1e-3)
# Compiling the model with categorical crossentropy as loss function with accuracy as metrics
model_2.compile(loss=losses.categorical_crossentropy, optimizer=SGD, metrics=['accuracy']) 
# Fitting the model on X_train and y_train with 200 epcohs having validation split of 20% 
history_2=model_2.fit(X_train, y_train, validation_split=0.2, epochs=100, batch_size=128, verbose=2)

Epoch 1/100
3/3 - 0s - loss: 864.0719 - accuracy: 0.3688 - val_loss: 1.0983 - val_accuracy: 0.6125 - 281ms/epoch - 94ms/step
Epoch 2/100
3/3 - 0s - loss: 1.0981 - accuracy: 0.6250 - val_loss: 1.0979 - val_accuracy: 0.6125 - 16ms/epoch - 5ms/step
Epoch 3/100
3/3 - 0s - loss: 1.0978 - accuracy: 0.6250 - val_loss: 1.0975 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 4/100
3/3 - 0s - loss: 1.0974 - accuracy: 0.6250 - val_loss: 1.0972 - val_accuracy: 0.6125 - 25ms/epoch - 8ms/step
Epoch 5/100
3/3 - 0s - loss: 1.0970 - accuracy: 0.6250 - val_loss: 1.0968 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 6/100
3/3 - 0s - loss: 1.0966 - accuracy: 0.6250 - val_loss: 1.0965 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 7/100
3/3 - 0s - loss: 1.0962 - accuracy: 0.6250 - val_loss: 1.0961 - val_accuracy: 0.6125 - 25ms/epoch - 8ms/step
Epoch 8/100
3/3 - 0s - loss: 1.0959 - accuracy: 0.6250 - val_loss: 1.0957 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 9/100
3/3 - 0s - loss:

Epoch 69/100
3/3 - 0s - loss: 1.0741 - accuracy: 0.6250 - val_loss: 1.0753 - val_accuracy: 0.6125 - 24ms/epoch - 8ms/step
Epoch 70/100
3/3 - 0s - loss: 1.0738 - accuracy: 0.6250 - val_loss: 1.0750 - val_accuracy: 0.6125 - 16ms/epoch - 5ms/step
Epoch 71/100
3/3 - 0s - loss: 1.0734 - accuracy: 0.6250 - val_loss: 1.0747 - val_accuracy: 0.6125 - 32ms/epoch - 11ms/step
Epoch 72/100
3/3 - 0s - loss: 1.0731 - accuracy: 0.6250 - val_loss: 1.0743 - val_accuracy: 0.6125 - 32ms/epoch - 11ms/step
Epoch 73/100
3/3 - 0s - loss: 1.0727 - accuracy: 0.6250 - val_loss: 1.0740 - val_accuracy: 0.6125 - 21ms/epoch - 7ms/step
Epoch 74/100
3/3 - 0s - loss: 1.0724 - accuracy: 0.6250 - val_loss: 1.0737 - val_accuracy: 0.6125 - 21ms/epoch - 7ms/step
Epoch 75/100
3/3 - 0s - loss: 1.0721 - accuracy: 0.6250 - val_loss: 1.0734 - val_accuracy: 0.6125 - 27ms/epoch - 9ms/step
Epoch 76/100
3/3 - 0s - loss: 1.0717 - accuracy: 0.6250 - val_loss: 1.0731 - val_accuracy: 0.6125 - 47ms/epoch - 16ms/step
Epoch 77/100
3/3 - 0s

In [13]:
# Predicting on test data
y_pred_2=model_2.predict(X_test)

In [14]:
# Applying argmax
y_pred_final_2=[]
for i in y_pred_2:
    y_pred_final_2.append(np.argmax(i))

In [15]:
# Classification report
print(classification_report(y_test,y_pred_final_2))

              precision    recall  f1-score   support

           0       0.51      1.00      0.68        51
           1       0.00      0.00      0.00        23
           2       0.00      0.00      0.00        26

    accuracy                           0.51       100
   macro avg       0.17      0.33      0.23       100
weighted avg       0.26      0.51      0.34       100



### Correct Answer: **<60**