<a href="https://colab.research.google.com/github/tennille-bernard/Kal-Academy-Modules/blob/main/ANN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf

**Check the version of Google Tensor Flow**

In [None]:
tf.__version__

'2.18.0'

**Part 1: Data Preprocessing**

In [None]:
dataset = pd.read_csv('Churn_Modelling.csv')
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [None]:
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values
print(X)


[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


**Encoding Categorical Data**  
1. Label Encoding the Gender Column to convert into zero or one (0 or 1).  
2. One hot encoding the Geography Column.

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

In [None]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


In [None]:
# Geography
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

#Must always .fit_transform(X) on the np.array()

** Also **
1. Visualize the data to check for correlations.
2. Check for null values.
3. Check for missing data.

In [None]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


**Split the dataset into training and testing data**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

**Feature scaling**

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

**Part 2: Building the Artificial Neural Network (ANN)**  


1.   Initialize the ANN.
2.   Add the input layer and the first hidden layer.  
3.   Add the second hidden layer.  
4.   Add the output layer.  
5.   Training the ANN.  


**Understanding the code**  
1. keras = a popular library used with tensor flow, providing the tools you need for building a neural network.
2. models = The type of model that you will build.
3. Sequential() = builds input layer then hidden layers then output layer one stage at a time.

In [None]:
#1 Initializing the ANN
ann = tf.keras.models.Sequential()

1. layers = by default, an input and a first hidden layer are created using the code above. "layers." will create subsequent hidden layers as you need them.
2. Dense() = these are dense networks so you will need to specify this.  

**Parameters**
3. units = number of neurons we want in the first hidden layer.  We can use GridSearch CV to refine the parameters to use.
number of inputs will be automatically calculated by Python.
4. activation =
5. 'relu' = Rectifier Layer Unit =  Every hidden layer has an activation function. This one will give the max value of X if X is +ve and 0 if X s -ve.

In [None]:
# Adding the input layer and the first hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))


In [None]:
# Adding the second hidden layer
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

**Understanding the code**
1. sigmoid = activation function which uses the probability where 0=< X=< 1. Anything under .5 (50%) is considered a 0, 0.5 to 1 is considered a 1.

In [None]:
# Adding the output layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

**Training the ANN**  
1. Compiling the ANN.  
2. Training the model.  

Note that this may take a while (45+ mins).

In [None]:
#Compiling the ANN
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

** Understanding the code**  
1. optimizer = helps you determine the learning rate. This is an algorithm. This is how it knows how to upgrade the weights.
2. 'adam' = the Stochastic optimizer. Uses adaptive learning rate. See Module 11 slides/notes. Other kinds: SGD (often used in CNNs)
3. loss = A loss function measures how good a neural network model is in performing a certain task. The smaller the loss, the better the model.
+ It is is a mathematical function that measures the difference between a model's predicted outputs and the actual target values, essentially quantifying how wrong the model is on a given data point, and is used to guide the training process by minimizing this error through parameter adjustments; common examples include Mean Squared Error (MSE) for regression tasks and Cross-Entropy loss for classification tasks.
4. 'binary_crossentropy' = the equivalent to the Cross Function aka C = sum of ((1/2)* ((y_hat - y)^2))
5. metrics = how we measure our model
6. accuracy = measurement of model.  The higher the accuracy the better the model, esp. if loss is smaller and smaller.

+ Other types: precision, recall, F1 score - check classification slides. To add more metrics, simply write as 'metric_type' within the square brackets, separating each with a comma. Remember to use the quotes.

In [None]:
#Training the model
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)

Epoch 1/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.5783 - loss: 0.6624
Epoch 2/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.7939 - loss: 0.4711
Epoch 3/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8081 - loss: 0.4359
Epoch 4/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8229 - loss: 0.4200
Epoch 5/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8328 - loss: 0.4046
Epoch 6/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8267 - loss: 0.4037
Epoch 7/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8294 - loss: 0.4026
Epoch 8/100
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8402 - loss: 0.3853
Epoch 9/100
[1m250/250[0m [32

<keras.src.callbacks.history.History at 0x7fda54155dd0>

+ batch_size = the number of training samples you want the model to process before it updates the weights. Ref mini batch gradient descent.  
 + 1 - 16 is slow & uses low memory.
 + 32 - 256 is medium speed & uses medium amount of memory.
 + 512+ is fast and uses a large amount of memory.

+ epochs = when you run through the dataset once, you have an Epoch. You're specifying how many times you will run the dataset. In this case, 100.  

See Module 11.  
Remember you can use GridSearchCV to help determine these hyper parameters. See Model_Performance.ipynb

**Predicting and Evaluating the Model**  
1. Predicting our test results.  
2. Checking the accuracy with a confusion matrix.  
3. Fine tuning the parameters by wrapping the ANN model into a KerasClassifier to be compatible with GridsearchCV. (Will need to install **scikeras** and **scikit-learn version 1.2.2**).  

Predicting our test results for a single observation:
+ GEO = France;
+ Credit Score = 600;
+ Gender = Male;
+ Age = 40;
+ Tenure = 3
+ Balance = 60000;
+ Number of Products = 2;
+ Credit Cards = Yes;
+ is active = Yes;
+ Estimated Salary = 50000.  

In general, pay attention to  
+ the number of inputs, and whether they've been scaled so that we know if we have to transform or scale the prediction input.  
+putting your data in an array because we created our model using an array (see code below - prediction is an array, check brackets to be sure).

Also, we have the ANN.predict>0.5 because we're looking for the probability. Anything less than 0.5 is zero or no or false; 0.5 or more is 1 or yes or true.

True = customer will leave the bank.  
False = customer will not leave the bank.

In [None]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
[[False]]


In [None]:
#Predicting our test results.
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step
[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


We got all the probabilities for all the rows. We want to also see a 1 or 0, so we do the > 0.5 again.

In [None]:
#Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[1513   82]
 [ 202  203]]


0.858

Accuracy score = 0.86.  
**Good Model**

**Fine tune parameters.** (In the case the model was not good.)   
+ Wrap the ANN model into a KerasClassifier to be compatible with GridSearchCV.  
+ Install scikeras and scikit-learn using !pip


In [None]:
!pip install scikeras

Collecting scikit-learn>=1.4.2 (from scikeras)
  Downloading scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Downloading scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m90.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.2.2
    Uninstalling scikit-learn-1.2.2:
      Successfully uninstalled scikit-learn-1.2.2
Successfully installed scikit-learn-1.6.1


In [None]:
!pip install scikit-learn==1.2.2

Collecting scikit-learn==1.2.2
  Using cached scikit_learn-1.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Using cached scikit_learn-1.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB)
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.6.1
    Uninstalling scikit-learn-1.6.1:
      Successfully uninstalled scikit-learn-1.6.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
scikeras 0.13.0 requires scikit-learn>=1.4.2, but you have scikit-learn 1.2.2 which is incompatible.
mlxtend 0.23.4 requires scikit-learn>=1.3.1, but you have scikit-learn 1.2.2 which is incompatible.
imbalanced-learn 0.13.0 requires scikit-learn<2,>=1.3.2, but you have scikit-learn 1.2.2 which is incompatible.[0m[31m
[0mSuccessfully installed scikit-lea

In [None]:
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import GridSearchCV

# Ensure all parameters GridSearchCV may pass are included in the function signature
def create_model(neurons=10, activation='relu', optimizer='adam', batch_size=32, epochs=100):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Dense(neurons, activation=activation, input_shape=(X_train.shape[1],)))
    model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# Wrap the model properly
keras_model = KerasClassifier(model=create_model)

# Define parameter grid correctly
parameters = {
    'model__neurons': [6, 10, 32, 64],  # Matches the function parameter
    'model__activation': ['relu', 'tanh'],
    'model__optimizer': ['adam', 'rmsprop'],
    'batch_size': [16, 32],  # These parameters belong to GridSearchCV, not create_model
    'epochs': [100, 250]
}

# Initialize GridSearchCV
grid = GridSearchCV(estimator=keras_model, param_grid=parameters, cv=3, n_jobs=-1)

# Fit the grid search model
grid_result = grid.fit(X_train, y_train)

# Print the best parameters
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

KeyboardInterrupt: 

On video: 53:05 / 1:03:57.