<font color='mediumblue'>
## How to Build Neural Networks <br>
Constructing Neural Networks to solve ML problems is a multiple-stage process. Generally, one can identify the key steps as follows:
* ***step 1:*** Load and process the data
* ***step 2:*** Define the model and its architecture
* ***step 3:*** Choose the optimizer and the cost function
* ***step 4:*** Train the model 
* ***step 5:*** Evaluate the model performance on the *unseen* test data
* ***step 6:*** Modify the hyperparameters to optimize performance for the specific data set *(optional)*

## A real example — recognizing handwritten digits ##
We will build networks that can recognize handwritten numbers. <br>
For achieving this goal, we use MNIST, a database of handwritten digits made up of a training set of 60,000
examples and a test set of 10,000 examples. 
<br>The training examples are annotated by humans
with the correct answer. 
<br>For instance, if the handwritten digit is the number three, then
three is simply the label associated with that example.<br>
Each MNIST image is in gray scale, and it consists of 28 x 28 pixels. A subset of these
numbers is represented in the following diagram:
![Deep%20Learning%20with%20Keras.bmp](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcToXD458Zgqs8VLLzbJPImOK6EgAf4OBquibDaUnw344dkcd1kT)

We will build six different neural networks and compare their performance:
* [Case #1: Defining a simple neural net in Keras](#Case-#1:-Defining-a-simple-neural-net-in-Keras)
* [Case #2: Improving the simple net in Keras with hidden layers](#Case-#2:-Improving-the-simple-net-in-Keras-with-hidden-layers)
* [Case #3: Further improving the simple net in Keras with dropout](#Case-#3:-Further-improving-the-simple-net-in-Keras-with-dropout)
* [Case #4: Testing different optimizers in Keras](#Case-#4:-Testing-different-optimizers-in-Keras)
* [Case #5: Adopting regularization for avoiding overfitting](#Case-#5:-Adopting-regularization-for-avoiding-overfitting)
* [Case #6: Optimizing hyperparameters with Keras-scikit-wrapper](#Case-#6:-Optimizing-hyperparameters-with-Keras-scikit-wrapper)

<font color = "#CC3D3D">
## Case #1: Defining a simple neural net in Keras

##### Set Up

In [None]:
# 모델 시각화 및 저장을 위해 관련 패키지를 아래와 같이 설치해야 한다.

#!pip install pydot graphviz h5py

In [None]:
#
# Setting for obtaining reproducible results
#

import numpy as np
import tensorflow as tf
import random as rn

# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926

import os
os.environ['PYTHONHASHSEED'] = '0'

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.

np.random.seed(42)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.

#rn.seed(12345)

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, 
                              inter_op_parallelism_threads=1)

from keras import backend as K

# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, 
# see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.set_random_seed(1234)

sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

In [None]:
import keras, sklearn
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils

### Step 1: Load and Process the Data ###

In [None]:
# mnist 이미지 데이터
(X_train, y_train), (X_test, y_test) = mnist.load_data()
#X_train is 60000 rows of 28x28 values --> reshaped in 60000 x 784
RESHAPED = 784 # 전체 픽셀 갯수

X_train = X_train.reshape(60000, RESHAPED) # 학습이미지 6만개
X_test = X_test.reshape(10000, RESHAPED) # 평가이미지 만개
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# 한 픽셀 값은 0~255로 표현된다. 0은 흰색, 255는 검은색

# normalize
X_train /= 255 
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
# 뉴럴넷은 입력데이터에 대한 스케일 조정이 필요
# to_categorical : 찾을 값이 0~9까지이므로 10개로 원핫 인코딩을 한다.
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

### Step 2: Define the Model & its Architecture ###

In [None]:
# 10 outputs
# final stage is softmax
model = Sequential()

# Dense : 입력모드가 784(RESHAPED), 첫번째 히든레이어는 10개로 연결하라
model.add(Dense(10, input_shape=(RESHAPED,))) 
model.add(Activation('softmax'))

model.summary() # 종합한것을 본다.  
# Param 이 weight 이다. 7850개를 학습시키면 된다.

In [None]:
from IPython.display import Image
from keras.utils.vis_utils import model_to_dot

Image(model_to_dot(model,show_shapes=True, show_layer_names=False).create(prog='dot', format='png'))

### Step 3: Choose the Optimizer and the Cost function

In [None]:
# optimizer : SGD 는 최적의 방법을 찾는 것
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

### Step 4: Train the Model

In [None]:
%%time
# validation_split 0.2 : 80은 학습하고 20%는 평가를 위해 남겨 두겠다
# verbose : 중간 중간의 과정을 찍을 것인지 말것인지, 얼마나 자세하게 정보를 표시할 것인가를 지정. (0, 1, 2)
history = model.fit(X_train, Y_train, batch_size=128, epochs=30, verbose=1, 
                    validation_split=0.2)
# history변수에 저장한 것은 나중에 체크해보기 위함.

In [None]:
plt.plot(history.history["loss"], label="train loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.title("Loss")
plt.show()

### Step 5: Evaluate the Model performance

In [None]:
# evaluate : 전체
# 케라스는 pridict 하면 확률을 출력한다.
score = model.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])

##### Save & Load the Model

In [None]:
from keras.models import load_model

In [None]:
model.save('mnist_dnn_01.h5')
%ls

In [None]:
model2 = load_model('mnist_dnn_01.h5')
Image(model_to_dot(model2,show_shapes=True, show_layer_names=False).create(prog='dot', format='png'))

<font color = "#CC3D3D"><br>
## Case #2: Improving the simple net in Keras with hidden layers

### Step 2: Define the Model & its Architecture ###

In [None]:
model = Sequential()

# 첫번째 층
model.add(Dense(32, input_shape=(RESHAPED,)))
model.add(Activation('relu')) # 엑티베이션 함수를 relu 함수로 사용

# 두변째 층
model.add(Dense(32, activation='relu')) # 위의 두줄을 한줄로 표현
#model.add(Activation('relu'))

# 세번째 층
model.add(Dense(32))
model.add(Activation('relu'))

# 출력층
model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

# 총 weight 수가 27,562개

In [None]:
Image(model_to_dot(model,show_shapes=True, show_layer_names=False).create(prog='dot', format='png'))

### Step 3: Choose the Optimizer and the Cost function

In [None]:
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])
#model.compile(loss='categorical_crossentropy', optimizer=Adam, metrics=['accuracy'])

### Step 4: Train the Model

In [None]:
history = model.fit(X_train, Y_train, batch_size=64, epochs=30, verbose=1, 
                    validation_split=0.2)

plt.plot(history.history["loss"], label="train loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.title("Loss")
plt.show()

### Step 5: Evaluate the Model performance

In [None]:
score = model.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])

model.save('mnist_dnn_02.h5')

<font color = "#CC3D3D"><br>
## Case #3: Further improving the simple net in Keras with dropout

### Step 2: Define the Model & its Architecture ###

In [None]:
from keras.layers.core import Dropout

# Dropout : 층별로 오버피팅을 줄일 수 있도록 지정할 수 있다.
model = Sequential()

model.add(Dense(128, input_shape=(RESHAPED,)))
model.add(Activation('relu'))
model.add(Dropout(0.2)) # Dropout 비율을 20%로 하라

model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))

model.summary()

### Step 3: Choose the Optimizer and the Cost function

In [None]:
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

### Step 4: Train the Model

In [None]:
history = model.fit(X_train, Y_train, batch_size=128, epochs=30, verbose=1, 
                    validation_split=0.2)

plt.plot(history.history["loss"], label="train loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.title("Loss")
plt.show()

### Step 5: Evaluate the Model performance

In [None]:
score = model.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])

model.save('mnist_dnn_03.h5')

<font color = "#CC3D3D"><br>
## Case #4: Testing different optimizers in Keras 

### Step 2: Define the Model & its Architecture ###

In [None]:
model = Sequential()
model.add(Dense(128, input_shape=(RESHAPED,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()

### Step 3: Choose the Optimizer and the Cost function

In [None]:
from keras.optimizers import Adam

model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

### Step 4: Train the Model

In [None]:
history = model.fit(X_train, Y_train, batch_size=64, epochs=30, verbose=1, 
                    validation_split=0.2)

plt.plot(history.history["loss"], label="train loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.title("Loss")
plt.show()

### Step 5: Evaluate the Model performance

In [None]:
score = model.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])

model.save('mnist_dnn_04.h5')

<font color = "#CC3D3D"><br>
## Case #5: Adopting regularization for avoiding overfitting
<br><img src="https://i.stack.imgur.com/j2F6j.png" width=600 height=400>

### Step 2: Define the Model & its Architecture

In [None]:
# 오버피팅을 줄이는 방법
from keras import regularizers

# Max norm constraints: 
# refer to http://cs231n.github.io/neural-networks-2/#reg
from keras.constraints import max_norm

# Batch normalization layer normalizes the activations of the previous layer at each batch,
# i.e. applies a transformation that maintains the mean activation close to 0 
# and the activation standard deviation close to 1.
from keras.layers import BatchNormalization

model = Sequential()
#model.add(Dense(128, input_shape=(RESHAPED,), kernel_regularizer=regularizers.l2(0.01), kernel_initializer="glorot_normal"))
model.add(Dense(128, input_shape=(RESHAPED,), kernel_constraint=max_norm(2.), kernel_initializer="he_normal"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))
#model.add(Dense(64, kernel_regularizer=regularizers.l2(0.01), kernel_initializer="glorot_normal"))
model.add(Dense(64, kernel_constraint=max_norm(2.), kernel_initializer="he_normal"))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()

### Step 3: Choose the Optimizer and the Cost function

In [None]:
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['accuracy'])

### Step 4: Train the Model

In [None]:
# Early stopping is basically stopping the training once your loss starts to increase.
# 학습 조기 종료를 위해 더 이상 개선의 여지가 없을 때 학습을 종료시키는 콜백함수
from keras.callbacks import EarlyStopping
early_stop = EarlyStopping(patience=3)
# patience : 개선이 없다고 바로 종료하지 않고 개선이 없는 에포크를 얼마나 기다려 
#           줄 것인 가를 지정합니다. 
# 만약 10이라고 지정하면 개선이 없는 에포크가 10번째 지속될 경우 학습을 종료합니다.

history = model.fit(X_train, Y_train, batch_size=128, epochs=30, verbose=1, 
                    validation_split=0.2, callbacks=[early_stop])

plt.plot(history.history["loss"], label="train loss")
plt.plot(history.history["val_loss"], label="validation loss")
plt.legend()
plt.title("Loss")
plt.show()

### Step 5: Evaluate the Model performance

In [None]:
score = model.evaluate(X_test, Y_test, verbose=1)
print("Test score:", score[0])
print('Test accuracy:', score[1])

model.save('mnist_dnn_05.h5')

<font color = "#CC3D3D"><br>
## Case #6: Optimizing hyperparameters with Keras-scikit-wrapper

##### 1) Define a function which constructs, compiles and returns a Keras model

In [None]:
def dnn_model(optimizer=Adam(), dropout_rate=0.0):
    # Define the model & its architecture    
    model = Sequential()
    model.add(Dense(128, input_shape=(RESHAPED,), activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(10, activation='softmax'))
    # Choose the optimizer and the cost function
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    # Return model           
    return model

##### 2) Instantiate `KerasClassifier` which implements the Scikit-Learn classifier interface

In [None]:
from keras.wrappers.scikit_learn import KerasClassifier
# call Keras scikit wrapper
dnn = KerasClassifier(build_fn=dnn_model, epochs=1)

##### 3) Apply scikit-learn's `RandomizedSearchCV` (or `GridSearchCV`)

In [None]:
from sklearn.model_selection import RandomizedSearchCV

# Specify parameters and distributions to sample from
param_dist = {
    'dropout_rate': [0.0, 0.2, 0.5], 
    'optimizer': ['rmsprop', 'adam'], 
    'batch_size': [32, 64, 128]
}

# Run randomized search
#n_iter : int, 훈련 데이터셋 반복 횟수
n_iter_search = 5
random_search = RandomizedSearchCV(dnn, param_distributions=param_dist, n_iter=n_iter_search, cv=3)
random_search.fit(X_train, y_train)
print(random_search.score(X_test, y_test))

# Summarize results
print("Best: %f using %s" % (random_search.best_score_, random_search.best_params_))
means = random_search.cv_results_['mean_test_score']
stds = random_search.cv_results_['std_test_score']
params = random_search.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %s" % (mean, stdev, param))

## End