# Credit Scoring

In this activity, you will use a deep learning model to predict the credit scores of borrowers using alternative data.

## Instructions

Fintech opportunities in emerging economies are extremely large. There are billions of new consumers, with access to a digital wallet, who have a desire to get a lower interest rate loan. The trouble is, most of them don't have a credit score.

An alternative data firm is therefore collecting data on emerging market consumers, from utility bills, to industry worked, to even responses to online surveys about good money habits. They've provided you this data, in order to build a model which can be use all of this information to provide a usable credit score for anyone interested in applying for a loan.

The dataset contains `68` encoded features (columns from `0` to `67`), with all personal identifying information removed. The last two columns of the dataset (columns `68` and `69`) are preliminary credit score quality indicators that have been manually assigned by staff at the firm. (The firm thinks that if a model can be built for this labeled data, it can then be used to automatically make credit predictions about customers it hasn't gone through this labelling process with).

1. Create a shallow (`1` hidden layer) and deep neural network (with two layers) to predict the geographical coordinates of the compositions represented in the data. Decide on your own how many neurons you will use on each hidden layer.

2. Fit each model with at least `800` epochs, and setting `validation_split=0.3`.

3. Compare the loss metrics for the two models.

4. Compare train (loss) and test (val_loss) metrics for both models, and look for signs of overfitting.

## Hint

* Note that that there needs to be two regression outputs. Your model structure should reflect this.

* When fitting the model, you can set the parameter `verbose=0` in the `fit()` method to mute the printing of each epoch's results.


In [None]:
# Initial imports
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
# Read in data
data = Path("../Resources/credit_scores.csv")
df = pd.read_csv(data, header=None)
df.head()

## Prepare the data

In [None]:
# Create the features set (X) and the target set (y)

# The features dataset consists of columns 0 to 67
X = df.iloc[:, 0:68]

# The target consists of columns 68 and 69
y = df.iloc[:, 68:70]

# View data for the features set
X.head()

In [None]:
# Scale the data of the features set using the StandardScaler
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler().fit(X)
X = scaler.transform(X)

## Create a shallow (`1` hidden layer) and deep neural network (with two layers) to predict the credit scores represented in the data. Decide on your own how many neurons you will use on each hidden layer.

In [None]:
# Create a shallow, 1 hidden layer, neural network

# Instantiate an instance of the Sequential model
nn = # YOUR CODE HERE

# Create 1 hidden layer
# YOUR CODE HERE

# Create the output layer
# YOUR CODE HERE

# Compile the model 
# Set the parameters as mean_squared_error, adam, and mse.
# YOUR CODE HERE


## Step 2. Fit each model with at least `800` epochs, and setting `validation_split=0.3`.

In [None]:
# Fit the model
model_1 = # YOUR CODE HERE

## Create a deep neural network (with two layers) to predict the credit data. Decide on your own how many neurons you will use on each hidden layer.

In [None]:
# Create a deep neural network with 2 hidden layers

# Instantiate an instance of the Sequential model
dnn = # YOUR CODE HERE

# Create the first hidden layer
# YOUR CODE HERE

# Create the second hidden layer
# YOUR CODE HERE

# Create the Output layer
# YOUR CODE HERE

# Compile the model 
# Set the parameters as mean_squared_error, adam, and mse.
# YOUR CODE HERE


## Step 2. Fit each model with at least `800` epochs, and setting `validation_split=0.3`.

In [None]:
# Fit the model
model_2 = # YOUR CODE HERE

# Evaluate the models

## Step 3: Compare the loss metrics for the two models.

In [None]:
# Plot the loss function of the training results for the two models
plt.plot(# YOUR CODE HERE)
plt.plot(# YOUR CODE HERE)
plt.title("loss_function - Training - 1 hidden layer Vs. 2 hidden layer")
plt.legend(["1 hidden layer", "2 hidden layers"])
plt.show()

## Step 4: Compare train (loss) and test (val_loss) metrics for both models, and look for signs of overfitting.

In [None]:
# Plot train vs test for the shallow neural net
plt.plot(# YOUR CODE HERE)
plt.plot(# YOUR CODE HERE)
plt.title("loss_function - 1 hidden layer - Train Vs. Test")
plt.legend(["train", "test"])
plt.show()

In [None]:
# Plot train vs test for the deep neural net
plt.plot(# YOUR CODE HERE)
plt.plot(# YOUR CODE HERE)
plt.title("loss_function - 2 hidden layers - Train Vs. Test")
plt.legend(["train", "test"])
plt.show()