# Big Data Content Analytics - AUEB

## Regularization Techniques + Batch Normalization

* Lab Assistant: George Perakis
* Email: gperakis[at]aeub.gr 

### Batch Normalization

As Pavel said, **Batch Normalization is just another layer**, so you can use it as such to create your desired network architecture.

The general use case is to use BN between the linear and non-linear layers in your network, because it normalizes the input to your activation function, so that you're centered in the linear section of the activation function (such as Sigmoid). 

On the other hand studies showed that is better to user BN after the activation function

https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md

In [1]:
# Read
# https://medium.com/@yongddeng/regression-analysis-lasso-ridge-and-elastic-net-9e65dc61d6d3

### Imports

In [3]:
import numpy as np
import pandas as pd

from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Flatten, Dense, Activation, Dropout, BatchNormalization
from tensorflow.keras.optimizers import SGD

# used to create mock-up data
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

### Create Mock-up dataset

In [4]:
n_feats = 20

X, y = make_classification(n_samples=100_000,
                           n_features=20,
                           n_informative=3,
                           n_redundant=0,
                           n_classes=2,
                           n_clusters_per_class=2)

pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,-2.222431,-0.40484,-1.367708,-1.838549,0.558639,-1.151048,-0.326743,-0.960207,-0.245035,0.891759,-0.932901,-1.555012,0.666864,0.865662,-2.020125,1.041558,-0.699981,-0.506071,-1.436334,-0.007326
1,2.561517,-0.242772,0.11187,-1.382386,-0.933188,0.660977,-0.45238,1.188288,1.065032,-0.412277,-0.193847,0.070784,0.054065,-0.278493,1.324269,0.159381,0.139007,0.107552,-1.656939,0.042009
2,0.878567,0.384237,0.561682,1.140172,1.130895,-1.254897,0.600579,-1.275912,-0.601391,0.230967,0.361643,-2.221348,0.877128,-0.495148,1.822775,1.104424,-1.947562,0.957133,0.697781,0.456173
3,-0.316571,-0.587674,-0.568607,0.490163,1.058316,0.838888,-0.879359,1.184195,-0.911086,-0.824618,-0.016187,0.988806,1.195091,-0.058505,0.275869,0.248299,1.487205,0.638569,-0.579259,0.316917
4,1.946009,-0.631675,1.020665,-1.913442,-0.588241,0.069347,0.495116,0.128003,0.624314,-0.041611,-0.90959,-0.859718,-1.761045,0.006684,0.416346,0.411292,0.268107,-1.060414,-0.466921,0.551149


In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

In [6]:
X_train.shape

(75000, 20)

In [7]:
X_train.shape[0]

75000

### Build Sequential Model

In [8]:
# instantiate model
model = Sequential()

# we can think of this chunk as the input layer
model.add(Dense(128, input_dim=X_train.shape[1]))

model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))

# we can think of this chunk as the hidden layer    
model.add(Dense(64))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

# we can think of this chunk as the output layer
model.add(Dense(1))
model.add(BatchNormalization())
model.add(Activation('sigmoid'))

# setting up the optimization of our weights 
sgd = SGD(lr=0.1,
          decay=1e-6,
          momentum=0.9,
          nesterov=True)

model.compile(loss='binary_crossentropy',
              optimizer=sgd,
              metrics=['acc'])


In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               2688      
_________________________________________________________________
batch_normalization (BatchNo (None, 128)               512       
_________________________________________________________________
activation (Activation)      (None, 128)               0         
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
activation_1 (Activation)    (None, 64)                0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 64)                2

### Model Fit

In [10]:
# fitting the model on the data
model.fit(X_train,
          y_train,
          epochs=20,
          batch_size=16,
          validation_split=0.2, 
          verbose = 2)

Epoch 1/20
3750/3750 - 22s - loss: 0.4437 - acc: 0.7969 - val_loss: 0.2911 - val_acc: 0.8841
Epoch 2/20
3750/3750 - 21s - loss: 0.4010 - acc: 0.8248 - val_loss: 0.2696 - val_acc: 0.8949
Epoch 3/20
3750/3750 - 22s - loss: 0.3865 - acc: 0.8316 - val_loss: 0.2524 - val_acc: 0.9127
Epoch 4/20
3750/3750 - 20s - loss: 0.3757 - acc: 0.8388 - val_loss: 0.2415 - val_acc: 0.9105
Epoch 5/20
3750/3750 - 21s - loss: 0.3718 - acc: 0.8388 - val_loss: 0.2431 - val_acc: 0.9120
Epoch 6/20
3750/3750 - 19s - loss: 0.3705 - acc: 0.8423 - val_loss: 0.2557 - val_acc: 0.9151
Epoch 7/20
3750/3750 - 19s - loss: 0.3712 - acc: 0.8417 - val_loss: 0.2430 - val_acc: 0.9117
Epoch 8/20
3750/3750 - 19s - loss: 0.3694 - acc: 0.8434 - val_loss: 0.2329 - val_acc: 0.9157
Epoch 9/20
3750/3750 - 20s - loss: 0.3662 - acc: 0.8429 - val_loss: 0.2455 - val_acc: 0.9079
Epoch 10/20
3750/3750 - 20s - loss: 0.3709 - acc: 0.8409 - val_loss: 0.2575 - val_acc: 0.9074
Epoch 11/20
3750/3750 - 20s - loss: 0.3632 - acc: 0.8453 - val_loss: 

<tensorflow.python.keras.callbacks.History at 0x2cf101bd588>

In [11]:
# Great article in Regularization Techniques:
# https://theaisummer.com/regularization/