# Part 8 Deep Learning
## 8.1 Artificial Neural Networks (ANN)
### 8.1.1 The Neuron
![8-1-1](8-1-1.png)

**Input Value**:
- The input value contains all of the independent variables (columns) for a single observation.
- It should be standardized.

**Output Value**:
- It can be continuous, binary, and categorical (several output values in the form of dummy variables).

The input and output value deals with one observation at every single time.

**Synapse**: 
- It represents the weight for each signal.

**Neuron**:
- **STEP 1**: Add the weighted sum of all input values $\sum_{i=1}^m w_ix_i$.
- **STEP 2**: Apply the activation function $\phi(\sum_{i=1}^m w_ix_i)$.
- **STEP 3**: Pass the signal to the next neuron.

### 8.1.2 The Activation Function
**Threshold Function**:

$$
\phi(x)=
\begin{cases}
1 \quad & \text{if } x\geq0\\
0 \quad & \text{if } x<0
\end{cases}
$$

**Threshold Function**:
$$
\phi(x)=\frac{1}{1+e^{-x}}
$$

**Rectifier Function**:
$$
\phi(x)=\max(x,0)
$$

**Hyperbolic Tangent (tanh)**:
$$
\phi(x)=\frac{1-e^{-2x}}{1+e^{-2x}}
$$

### 8.1.3 How Do NNs Learn?
- **STEP 1**: Fit the input value of each row into the neural network to compute the output value $\hat{y}$ for each row.
- **STEP 2**: Compare the output value $\hat{y}$ with the actual value $y$ for each row.
- **STEP 3**: Adjust the weights $w_1,w_2,\dots,w_n$.
- **STEP 4**: Do this process again and again in order to find the optimal weights to minimize the cost function.

### 8.1.4 Gradient Descent
- Batch Gradient Descent: Adjust the weights after running all rows.
- Stochastic Gradient Descent: Adjust the weights after running every single row.

Stochastic gradient descent an avoid converging to a local optimum. 

### 8.1.5 Training the ANN with Stochastic Gradient Descent
- **STEP 1**: Randomly initialize the weights to the small numbers close to $0$ (but not $0$).
- **STEP 2**: Input the first observation of your dataset in the input layer, each feature in one input node.
- **STEP 3**: Forward-Propagration: from left to right, the neurons are activated in a way that the impact of each neuron's activation is limited by the weights. Propagate the activations until the predicted result $y$.
- **STEP 4**: Compare the predicted result to the actual result. Measure the generated error.
- **STEP 5**: Forward-Propagration: from right to left, the error is back-propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
- **STEP 6**: Repeat Steps 1 to 5 and update the weights after each observation (reinforcement learning); Repeat Steps 1 to 5 but update the weights only after a batch of observations (batch learning).
- **STEP 7**: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.

### 8.1.6 ANN in Python
#### 8.1.6.1 Part 1: Data Preprocessing

In [1]:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
dataset.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [3]:
# Splitting the dataset into the independent and dependent variables
X = dataset.iloc[:, 3: -1].to_numpy()
y = dataset.iloc[:, -1].to_numpy()

In [4]:
# Encoding the categorical variables
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelEncoder_X_1 = LabelEncoder()
X[:, 1] = labelEncoder_X_1.fit_transform(X[:, 1])
labelEncoder_X_2 = LabelEncoder()
X[:, 2] = labelEncoder_X_2.fit_transform(X[:, 2])
oneHotEncoder = OneHotEncoder(categorical_features=[1])
X = oneHotEncoder.fit_transform(X).toarray()
X = X[:, 1:]

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


In [5]:
# Splitting the dataset into the training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [6]:
# Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

#### 8.1.6.2 Part 2: Building the ANN

In [7]:
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


In [8]:
# Initializing the ANN
classifier = Sequential()

In `Dense` class, we need to specify:
- `init='uniform'`: Initialize the weights randomly and make sure each weight is close to $0$.
- `activation`: `relu` for the rectifier function, `sigmoid` for the sigmoid function, and `softmax` for the more than two classes.

In [15]:
# Adding the input layer and first hiddern layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu', input_dim=11))

# Adding the second hidden layer
classifier.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))

# Adding the output layer
classifier.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))

In `compile` method, we need to specify:
- `optimizer`:Indicate the function to find the best weights, `adam` for stochastic gradient descent. 
- `loss`: Indicate which cost function is used to minimize the cost. `categorical_crossentropy` for two classes and `sparse_categorical_crossentropy` for more than two categories. 
- `loss`: The criterion used to evaluate the model. 

In [14]:
# Compling the ANN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In `fit` method, we need to specify:
- `batch_size`: Number of samples per gradient update.
- `nb_epoch`: Number of epochs to train the model.

In [15]:
# Fitting the ANN to the training set
classifier.fit(X_train, y_train, batch_size=10, epochs=100)

Instructions for updating:
Use tf.cast instead.


  


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x1a29589780>

#### 8.1.6.3 Part 3: Making the Predictions and Evaluating the Models

In [20]:
# Predicting the test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
y_pred

array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])

In [27]:
# Making the confustion matrix and calculating the scores
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
cm = confusion_matrix(y_test, y_pred)
cm

array([[1531,   64],
       [ 256,  149]])

In [28]:
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1 Score:', f1_score(y_test, y_pred))

Accuracy: 0.84
Precision: 0.6995305164319249
Recall: 0.36790123456790125
F1 Score: 0.482200647249191


## 8.2 Convolutional Neural Network (CNN)
### 8.2.1 STEP 1: Convolution

$$(f*g)(t):=\int_{-\infty}^{\infty}f(\tau)g(t-\tau)d\tau$$

![8-2-1](8-2-1.png)
![8-2-2](8-2-2.png)

### 8.2.2 STEP 1(B): ReLULayer
The function of ReLULayer is to remove the linearity among different pixels.

![8-2-3](8-2-3.png)

### 8.2.3 STEP 2: Pooling
Pooling is useful to 
- Identify features despite distortion and tilt of the image
- Reduce the size of the image
- Avoid overfitting
- Preserve the main feature of the original image

![8-2-4](8-2-4.png)

### 8.2.4 STEP 3: Flattening
![8-2-5](8-2-5.png)

### 8.2.5 STEP 4: Full Connection
![8-2-6](8-2-6.png)

### 8.2.6 Summary
![8-2-7](8-2-7.png)

### 8.2.7 STEP 5: Softmax & Cross-Entropy
![8-2-8](8-2-8.png)

Overall, the cross-entropy function is more significant than MSE and more suitable for CNN.

![8-2-9](8-2-9.png)

### 8.2.8 CNN in Python
#### 8.2.8.1 Building the CNN

In [25]:
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

In [26]:
# Initializing the CNN
classifier = Sequential()

In [27]:
# Step 1 - Convolution
classifier.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))

In [28]:
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size=(2, 2)))

# Adding the second convolutional layer
classifier.add(Conv2D(32, (3, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))

In [29]:
# Step 3 - Flattening
classifier.add(Flatten())

In [30]:
# Step 4 - Full Connection
classifier.add(Dense(activation='relu', units=128))
classifier.add(Dense(activation='sigmoid', units=1))

In [31]:
# Compiling the CNN
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

#### 8.2.8.2 Part 2: Fitting the CNN to the images

In [32]:
# Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
        'training_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

test_set = test_datagen.flow_from_directory(
        'test_set',
        target_size=(64, 64),
        batch_size=32,
        class_mode='binary')

classifier.fit_generator(
        training_set,
        steps_per_epoch=8000,
        epochs=25,
        validation_data=test_set,
        validation_steps=2000)

Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/25
  23/8000 [..............................] - ETA: 49:27 - loss: 0.7063 - accuracy: 0.5122

KeyboardInterrupt: 