<a href="https://colab.research.google.com/github/veena-vijayan/DeepLearning-with-Python/blob/master/DL_with_Python_Part(3_a).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Learning with Python**



> Develop Deep Learning Models on Theano and Tensorflow Using Keras - Jason Brownlee





---

**PART III :: Multilayer Perceptrons** (part 1)

---




## **Chapter 6 - Crash Course In Multilayer Perceptrons**




*   *Multilayer Perceptrons*<br>
    *   Perceptron: a single neuron model that was a precursor to larger neural networks - it is a field of study that investigates how simple models of biological brains can be used to solve difficult computational tasks like
the predictive modeling tasks we see in machine learning. 
    *   goal is not to create realistic models of the brain, but to develop robust algorithms and data structures to model difficult problems.
    *   power of neural networks come from their ability to learn the representation in your training data and how to best relate it to the output variable that you want to predict i.e. it learn a mapping - they are capable of learning any mapping function and have been proven to be a universal approximation algorithm. 
    *   predictive capability of neural networks comes from the hierarchical or multilayered structure of the networks - data structure can pick out (learn to represent) features at different scales or resolutions and combine them into higher-order features.
![picture](https://drive.google.com/uc?id=1vpGT5z-mS4L8H2n0gBqCeGgskgkcjv4a)

*   *Neuron Weights*<br>
    *   Like linear regression, each neuron also has a bias which can be thought of as an input that always has the value 1.0 and it too must be
weighted. For example, a neuron may have two inputs in which case it requires three weights.
    *   Weights are often initialized to small random values, such as values in the range 0 to 0.3 (more complex initialization schemes can also be used). 
    *   Like linear regression, larger weights indicate increased complexity and fragility of the model - desirable to keep weights in the network small and regularization techniques can be used.

*   *Activation*<br>
    *   weighted inputs are summed and passed through an activation function (transfer function) - a simple mapping of summed weighted input to the
output of the neuron. 
    *   called an activation function because it governs the threshold at which the neuron is activated and the strength of the output signal (if summed input >0.5, then output a value 1 else 0)
    *   Traditionally nonlinear activation functions are used - allows the network to combine the inputs in more complex ways and in turn provide a richer capability in the functions they can model.
    *   example: 
        *   sigmoid function - output a value between 0 and 1 with an s-shaped distribution
        *   Tanh - outputs the same distribution over the range -1 to +1.
        *   the rectifier activation function (ReLu) returen input if input > 0 else 0.

*   *Networks of Neurons*<br>
    *   Neurons are arranged into networks of neurons - row of neurons is called a layer and one network can have multiple layers. 
    *   architecture of the neurons in the network is called the network  topology.
![picture](https://drive.google.com/uc?id=1aDTl9ClcjZKtppNwWj0FGWJMBcOd14DV)
    *   Layers of the network:
      1.   Input or Visible Layers<br>
           *   bottom layer that takes input from your dataset - the exposed part of the network. 
           *   usually drawn with one neuron per input value or column in your dataset - they simply pass the input value though to the next layer.
      2.   Hidden Layers
           *   layers after the input layer - they are not directly exposed to the input. 
           *   Deep learning can refer to having many hidden layers in
your neural network. 
      3.   Output Layer
           *   final hidden layer - responsible for outputting a value or  vector of values that correspond to the format required for the problem. 
           *   The choice of activation function in the output layer is constrained by the type of modelling problem. 
           *   For example:
              *   regression problem - single output neuron and no activation function.
              *   binary classiffication problem - single output neuron and sigmoid activation function.
              *   multiclass classiffication problem - multiple neurons in the output layer (one for each class) and softmax activation function may be used to output a probability of the network predicting each of the class values. 

*   *Training Networks*<br>
    * Data Preparation
      *   Data must be numerical, for example real values. 
      *   categorical data can be converted to a real-valued representation called a one hot encoding.
      *   one hot encoding can be used on the output variable in classification problems with more than one class - would create a binary vector from a single column that would be easy to directly compare to the output of the neuron in the network's output layer, which would output one value for each class. 
      *   Neural networks require the input to be scaled in a consistent way - either rescale it to the range between 0 and 1 called normalization or standardize it so that the distribution of each column has the
mean of zero and the standard deviation of 1. 
      *   Scaling also applies to image pixel data. 
    * Stochastic Gradient Descent
      *   preferred training algorithm for neural networks is called stochastic
gradient descent - one row of data is exposed to the network at a time as input.
      *   forward pass on the network - the network processes the input upward activating neurons as it goes to produce an output value - also the type of pass that is used after training to make predictions on new data.
      *   Backpropagation algorithm - output of the network is compared to the expected output and an error is calculated - error is then propagated back through the network, one layer at a time, and the weights
are updated according to the amount that they contributed to the error. 
      *   the process is repeated for all of the examples in the
training data. One round of updating the network for the entire training dataset is called an epoch. A network may be trained for tens, hundreds or many thousands of epochs.
    * Weight Updates
      *   online learning - weights are updated from the errors calculated for each training example - can result in fast but also chaotic changes to the network.
      *   batch learning - errors can be saved up across all of the training examples and the network can be updated at the end - often more stable.
      *   size of the batch, the number of examples the network is shown before an update is often reduced to a small number, such as tens or hundreds of examples. 
      *   learning rate - the amount by which weights are updated - also called the step size and controls the step or change made to network weights for a given error - small learning rates are usually used such as 0.1 or 0.01 or smaller. 
      *   The update equation can be complemented with additional configuration terms that you can set:
          *   Momentum - incorporates the properties from the previous weight update to allow the weights to continue to change in the same direction even when there is less error being calculated.
          *   Learning Rate Decay - decrease the learning rate over epochs to allow the network to make large changes to the weights at the beginning and smaller fine tuning changes later in the training schedule.

*   *Prediction*<br>
    *   trained neural networks can be used to make predictions - you can make
predictions on test or validation data in order to estimate the skill of the model on unseen data or deploy it operationally and use it to make predictions continuously.
    *   network topology and the final set of weights is needed to be saved from the model. 
    *   predictions are made by providing the input to the network and performing a forward-pass allowing it to generate an output that you can use as a prediction.





> Summary



In this lesson you discovered artificial neural networks for machine learning. You learned:
1. How neural networks are not models of the brain but are instead computational models for solving complex machine learning problems.
2. That neural networks are comprised of neurons that have weights and activation functions.
3. The networks are organized into layers of neurons and are trained using stochastic gradient descent.
4. That it is a good idea to prepare your data before training a neural network model.




## **Chapter 7 - Develop Your First Neural Network With Keras**





*   how to create your own models in the future - the steps you are going to cover in this tutorial are as follows:
    1. Load Data:
      *   dataset - Pima Indians onset of diabetes
      *   binary classiffication problem (onset of diabetes as 1 or not as 0)
      *   input variables are numerical and have varying scales - a total of 8  attributes are present.
      *   a good idea to initialize the random number generator with a fixed seed value - so that you can run the same to get the same result - useful if you need to demonstrate a result, compare algorithms using the same source of randomness or to debug a part of your code.
      *   load the file directly using the NumPy function loadtxt() 
      *   split the dataset into input variables (X) and the output class variable (Y)  
    2. Define Model:
      *  models in Keras are defined as a sequence of layers.
      *  create a Sequential model and add layers one at a time.
      *  ensure the input layer has the right number of inputs - specified when creating the first layer with the input dim argument and setting it to 8 for the 8 input variables
      *  number of layers and their types - often the best network structure is found through a process of trial and error experimentation.
      *  here we use a fully-connected network structure with three layers. Fully connected layers are defined using the Dense class. We can specify the number of neurons in the layer as the first argument and specify the activation function using the activation argument. We will use the rectifier (relu) activation function on the first two layers and the sigmoid activation function in the output layer. <br>
![picture](https://drive.google.com/uc?id=1cVRkai29Nb4yGJ7I7AWH5qApwOwqB2ff)
    3. Compile Model:
      *  once the model is defined, we can compile it
      *  compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow.
      *  backend automatically chooses the best way to represent the network for training and making predictions - training a network means finding the
best set of weights to make predictions for this problem.
      *  when compiling, we must specify the *loss function - here, binary crossentropy* (use to evaluate a set of weights), the *optimizer - here adam* (used to search through different weights for the network) and optional *metrics - here, classiffication accuracy* (to collect and report during training). 
    4. Fit Model:
      *  execute the model on some data
      *  train or fit our model on our loaded data by calling the fit() function on the model.
      *  must also specify *epochs - here 150* and *batch size - here, 10* using the epochs and batch size argument.
    5. Evaluate Model:
      *  evaluate the performance of the network on the same dataset -this will only give us an idea of how well we have modeled the dataset (e.g. train accuracy), but no idea of how well the algorithm might perform on new
data - for simplicity, ideally to be split into train and test for the training and evaluation of your model.
      *  evaluate your model on your training dataset using the evaluation() function and pass it the same input and output used to train the model - will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics, such as accuracy.
    6. Tie It All Together:
      *  the above steps are combined together into a complete code example.






In [0]:
# Create your first MLP in Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy

# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10)

# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78


> Summary

In this lesson you discovered how to create your first neural network model using the powerful Keras Python library for deep learning. Specifically you learned the five key steps in using Keras to create a neural network or deep learning model, step-by-step including:
1. How to load data.
2. How to define a neural network model in Keras.
3. How to compile a Keras model using the efficient numerical backend.
4. How to train a model on data.
5.  to evaluate a model on data.



## **Chapter 8 - Evaluate The Performance of Deep Learning Models**



*   need to make a number of decisions when designing and configuring your deep learning models - high-level decisions like the number, size and type of layers in your network and lower level decisions like the choice of loss function, activation functions, optimization procedure and number of epochs.
*   need to have a robust test harness that allows you to estimate the performance of a given configuration on unseen data, and reliably compare the performance to other configurations.
*   *Data Splitting*: typical to use a simple separation of data into training and test datasets or training and validation datasets. 
*   Keras provides two convenient ways of evaluating your deep learning algorithms this way:
    1. Use an automatic verification dataset.
        *   keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset each epoch - can do this by setting the validation split argument on the fit() function to a percentage of the size of your training dataset.


In [0]:
# MLP with automatic validation set
from keras.models import Sequential
from keras.layers import Dense
import numpy

# fix random seed for reproducibility
numpy.random.seed(7)

# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit the model
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
E

<keras.callbacks.callbacks.History at 0x7f6aa4647c88>

2. Use a manual verification dataset.
    *   Keras allows you to manually specify the dataset to use for validation during training - we use the handy train test split() function from the Python scikit-learn machine learning library to separate our data into a training and test dataset - the validation dataset can be specified to the fit() function in Keras by the validation data argument - it takes a tuple of

In [0]:
# MLP with manual validation set
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=150, batch_size=10)

Train on 514 samples, validate on 254 samples
Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
E

<keras.callbacks.callbacks.History at 0x7f6a9b8d0f60>

*   *Manual k-Fold Cross-Validation*
    *   gold standard for machine learning model evaluation is k-fold cross-validation - provides a robust estimate of the performance of a model on unseen data. 
    *   it does this by splitting the training dataset into k subsets and takes turns training models on all subsets except one which
is held out, and evaluating model performance on the held out validation dataset. 
    *   the process is repeated until all subsets are given an opportunity to be the held out validation set. 
    *   the performance measure is then averaged across all models that are created.
    *   often not used for evaluating deep learning models because of the greater computational expense- when the problem is small enough or if you have sufficient compute resources, k-fold cross-validation can give you a less biased estimate of the performance of your model.
    *   in the example below, we use the handy StratifiedKFold class from the scikit-learn library to split up the training dataset into 10 folds - folds are stratified, meaning that the algorithm attempts to balance the number of instances of each class in each fold. (creates and evaluates 10 models using the 10 splits; verbose output for each epoch is turned off by passing verbose=0 to the fit() and evaluate(); and the performance is printed for each model)


In [0]:
# MLP for Pima Indians Dataset with 10-fold cross validation
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)

cvscores = []
for train, test in kfold.split(X, Y):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    # Fit the model
    model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
    
    # evaluate the model
    scores = model.evaluate(X[test], Y[test], verbose=0)
    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
    cvscores.append(scores[1] * 100)
    
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

accuracy: 79.22%
accuracy: 79.22%
accuracy: 72.73%
accuracy: 67.53%
accuracy: 70.13%
accuracy: 70.13%
accuracy: 75.32%
accuracy: 74.03%
accuracy: 69.74%
accuracy: 73.68%
73.17% (+/- 3.76%)



> Summary

In this lesson you discovered the importance of having a robust way to estimate the performance of your deep learning models on unseen data. You learned three ways that you can estimate  performance of your deep learning models in Python using the Keras library:
1. Automatically splitting a training dataset into train and validation datasets.
2. Manually and explicitly defining a training and validation dataset.
3. Evaluating performance using k-fold cross-validation, the gold standard technique.