# Neural Networks 

Artificial Neural Networks(ANN) are made up of several interconnected neurons which is organized in a layered fashion. Neurons in one layer pass messages to neurons in the next layer. Perceptrons are two layer neural network used for simple classification. Modern deep neural network consists of many hidden layers. 

## Perceptrons 

perceptron is a simple neural network. Given an input vector x of m values (x1,x2, ..xm), outpus , we define the following function 

$f(x)  =  1$, if $wx+b > 0$  and $0$ otherwise
$w$ is a vector of weights, and $wx$ is the dot product, $b$ is the bias. $wx+b$ defines a boundary hyperplane that changes position accordint to the values assigned to $w$ and $b$. 

Note: Hyperplane is a subspace whose dimension is one less than that of its ambient space. 

## Implementation of simple neural network 
Note : There are three ways of creating model in tf.keras 
 * Sequenctial 
 * Functional 
 * Model subclassing 
Sequential() model is a linear pipeline of neural network layers. a layer is dense if each neuron is connected to all neurons located in the previous layer and to all neuron in the following layer. 

In [None]:
import tensorflow as tf 
print(tf.__version__)
from tensorflow import keras
print(keras.__version__)

In [None]:
no_classes = 10   # prediction categories 
shape = 4  # input shape which has 4 features 

In [None]:
model = tf.keras.models.Sequential() 
model.add(
    keras.layers.Dense(
        no_classes,
        input_shape = (shape,), 
        kernel_initializer = 'zeros', 
        name = 'dense_layer',
        activation = 'softmax')
)                 

In [None]:
model.summary()

kernel_initializer parameter has few choices, the most common are 
 * random_uniform: weights are initialized uniformly(-0.05,0.05)
 * random_normal - weight are initializedd according to gaussian distribution, with zero mean and a small sd of 0.05.
 * zero - weights are assigned to zero 

## -->  Multi-layer perceptrons

## Activation functions : 
* sigmoid 
It is defined as $1/(1+exp(-x))$ and has the output changes in the range(0,1) when the input varies in the range (-inf, +inf). If z = wx+b is very large and positive then e^-z tends to 0. so sigmoid tends to 1. if z is very large and negative the e^-z tends to infinity and sigmoid tends to 0. 

* tanh 
which is defined as (e^z - e^-z) / ( e^z + e^-z) 

* relu: is defined as max(0,x) 

* elu 
* leaky relu

## One -hot encoding 
Categorical features can be encoded into binary vecotors of length same as the number of categories. 
The digit 3 in handwritten digit recognition is encoded into 
[0,0,0,1,0,0,0,0,0,0]. 

This type of representation is called one-hot encoding. 

In [None]:
import tensorflow as tf 
import numpy as np 
from tensorflow import keras 

In [None]:
epochs = 5
batch_size = 128
verbose = 1
classes = 10
h_hidden = 128
validation_split = 0.2

In [None]:
mnist = keras.datasets.mnist
(x_train,y_train),(x_test,y_test) = mnist.load_data()

In [None]:
# x_train is 60000 rows of 28*28 shape
shape = 784
x_train = x_train.reshape(60000, shape) 
x_test = x_test.reshape(10000, shape) 
x_train = x_train.astype('float32') 
x_test = x_test.astype('float32') 

In [None]:
# normalizing 
x_train = x_train/255
x_test = x_test/255
print(x_train[0].shape) 
print(x_test[0].shape)

In [None]:
# one hot representation of target categories 
y_train = tf.keras.utils.to_catgorical(y_train,classes) 
y_test = tf.keras.utils.to_categorical(y_test,classes) 

In [None]:
model = tf.keras.models.Sequential() 
model.add(keras.layers.Dense(
    classes, 
    input_shape = (shape,),
    name = 'dense_layer',
    activation = 'softmax'))

#### Binary cross entropy
binary_crossentropy, which defines the binary logarithmic loss. Suppose that our model predicts p while the target is c, then the binary cross-entropy is defined as

  $L(P, C) = −C l(p,c) − (1 − C) ln(1 − P)$. Note that this objective function is suitable for binary label prediction

#### Categorical cross entropy 
categorical_crossentropy, which defines the multiclass logarithmic loss. Categorical cross-entropy compares the distribution of the predictions with the true distribution, with the probability of the true class set to 1 and 0 for the other classes. If the true class is c and the prediction is y, then the categorical cross-entropy is defined as:

$L(c,p) = - {\sum}C_i ln(p_i)$

In [None]:
#model compilation 
model.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy']) 

### Epochs 
Epochs is the number of times the model is exposed to the training set.At each iteration the optimizre tries to adjust the weights sot that the objective function is minized. 

### Batch_size 
This is the number of training instances observed before the optimizer performs a weight update; there are usually many batches per epoch.

In [None]:
## model training
model.fit(x_train,y_train, batch_size = batch_size, 
          epochs = epochs , verbose = verbose, 
          validation_split = validation_split)

In [None]:
## testing the model
test_loss, test_acc = model.evaluate(x_test,y_test) 
print('test_accuracy', test_acc) 

### Adding hidden layers 
    model = tf.keras.models.Sequential() 
    model.add(keras.layers.Dense(n_nodes, 
              input_shape = (shape, ) , 
              name = 'first layer', 
              activation = 'relu'))
    model.add(keras.layers.Dense(n_nodes,
              name = 'second layer',
              activation = 'relu')) 
    model.add(keras.layers.Dense(n_nodes,
              name = 'third layer, 
              activation = 'softmax')) 
  

In [None]:
n_nodes = 10
shape = 23
model = tf.keras.models.Sequential() 
model.add(keras.layers.Dense(n_nodes, 
          input_shape = (shape, ) , 
          name = 'first layer', 
          activation = 'relu'))
model.add(keras.layers.Dense(n_nodes,
          name = 'second layer',
          activation = 'relu')) 
model.add(keras.layers.Dense(n_nodes,
          name = 'third layer, 
          activation = 'softmax')) 
model.summary() 

## Adding dropout between 
    dropout_rato = 0.3
    model.add(keras.layers.Dropout(dropout_ratio))

### Adding Regularization 
    from tf.keras.regularizars import l2, activity_l2
    model.add(Dense(64, input_dim = 64, W_regularizer= l2(0.01), 
                    activity_regularizer = activity_l2(0.01)))