# Neural Networks

## Keras
Keras Introduction: https://keras.io/

Keras Cheatsheet: https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf

Keras FAQ: https://keras.io/getting-started/faq/

Keras Sequential API: https://keras.io/getting-started/sequential-model-guide/

## What about Tensorflow?

Tensorflow is available as a "backend" for Keras. By default, Keras will use Tensorflow to perform deep learning operations.

More about backends here: https://keras.io/backend/

## Major Differences between Keras and Scikit-Learn

|Sklearn|Keras|
|--|--|
|Use for Machine Learning and limited Deep Learning (MLPClassifier, MLPRegressor)|Use only for Deep Learning|
|Scope: Linear Regression, Logistic Regression, Support Vector Machines, KMeans, PCA, etc|Scope: Deep Learning layers such as Dense, Convolutional, Recurrent|
|Only SGDRegressor, SGDClassifier,  MLP* do gradient descent|Exclusively uses gradient descent and back propagation|
|Not designed for long-haul training|Designed for long-haul training, supports saving and resuming training|
|Limited support for incremental fit|Always fits incrementally, unless you recompile network|
|Does not support GPU|Supports GPU|
|Does not support Tensorflow|Supports Tensorflow through a backend|
|Provides learning_curve() function for learning curve|Uses [Tensorboard](https://www.tensorflow.org/guide/summaries_and_tensorboard) for learning curve|
|Provides cross_validate() function for cross validation|Cross-validation is not supported, use validation split that is built into fit()|
|Supports fit with univariate y output only|Supports fit with univariate and multi-variate y output. For classification, y must be one-hot (more in the workshop)|

There are other minor differences between how the two libraries work. We'll highlight it along the way.

**Caution**: always consult documentation (don't assume Keras works like Scikit-learn, otherwise you waste time debugging)

## Keras Machine Learning Workflow

1. Problem Definition
    - Same as you normally would for any machine learning problem. The key difference with Keras is in the choice of neural networks as the model.

2. Data Engineering
    - Use pandas as you normally would

3. Feature Engineering
    - Use sklearn as you normally would

4. Model Engineering
    
    a. Define initial neural net
        - Define model architecture, such as the input shapes, output shapes, and neural network layers
        - model.compile to pick optimiser, loss function, metrics

    b. Setup training callbacks:
        - Learning curve using Tensorboard
        - Early stopping
        - [Optional] Model checkpoints to automatically save weights after every epoch
    
    c. Train model:
        - model.fit(): Unlike sklearn, fit() is cumulative (continues progress if you it call repeatedly)

        sklearn:
        ```
            model = SGDRegressor()
            model.fit(X_train, y_train)
            model.fit(X_train, y_train) # RESTARTS from scratch
        ```

         Keras:
         ```
             model.compile()
             model.fit(X_train, y_train) 
             model.fit(X_train, y_train) # RESUMES training from previously
         ```
         
5. Evaluation metrics
    - Keras: model.evaluate() - similar to model.score() in sklearn
    - Evaluation metrics in sklearn are more comprehensive. Use them here (e.g. classification_report)

6. Deployment
    - model.save()
    - load_model()
    - model.predict()

In this workshop, we'll walk through a simple Keras example to understand how to use it:

https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py

In [1]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe

#tfe.enable_eager_execution()

import keras

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report

%matplotlib inline
import os

#os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"

from MyTotoResearchv4 import *

Done.


Using TensorFlow backend.


## Load dataset

Keras includes some built-in datasets that are useful for learning and practice.

https://keras.io/datasets/

In [2]:
def getAllData(df):
    drop_cols = ['T', 'D', 'N1','N2','N3','N4','N5','N6','N7','L','M','S','R','E','A','V' ,'J','U']
    X = df.drop(drop_cols, axis=1)
    return X


In [3]:
mtr = MyTotoResearch(algo_no=1)
lresult, df = mtr.load_totodata()

X = mtr.modified_dataset(getAllData(df)) #
scaler = StandardScaler()
scaler.fit(X)
Z = scaler.transform(X)

X_train = X
Y = mtr.getTargets()

Loaded MyTotoResearch algo_no:  1
1521


## Data processing

In [4]:
X_train.shape 

(1521, 7)

In [5]:
y_train = mtr.get_result_n_encoded(1)
y_train = mtr.getTarget(3)
y_train.shape

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.


(1521,)

In [6]:
y_train.astype(object)
y_train.unique().shape
scaler = MinMaxScaler()
y_train.describe()
#scaler.fit(y_train)
#keras.utils.to_categorical(28, 33)

count    1521.000000
mean       17.443130
std         6.783946
min         3.000000
25%        12.000000
50%        17.000000
75%        22.000000
max        41.000000
Name: N, dtype: float64

In [7]:
# Keras requires the targets to be categorical (one-hot)
# vectors rather than class (label) vectors
# This means that we need to convert the target
# before passing it to fit() if doing multi-class classification

# convert class vectors to categorical vectors
# 5 to [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]

#num_classes = y_train.unique().shape[0]
#y_train = keras.utils.to_categorical(y_train, num_classes)


In [8]:
#y_train

## Feature engineering

This is an example dataset, so not much feature engineering is needed.

## Model engineering

To run tensorboard for viewing the Learning Curve

- Launch another Anaconda Prompt (because tensorboard will run in its own console):

```
(base) conda activate mldds
(mldds) cd folder\to\this\notebook
(mldds)tensorboard --logdir=logs --host=0.0.0.0
```

If this is the first time you are launching Tensorboard, you will not see any sessions until you call model.fit():

```
tensorboard = TensorBoard(log_dir='./logs/mnist_mlp/%d' % time.time())
history = model.fit(X_train, y_train, batch_size=128, epochs=10,
                    callbacks=[tensorboard], validation_split=.25)
```


In [9]:
X.shape[1]

7

In [10]:
y_train.shape
#y_train.unique().shape

(1521,)

In [11]:
from keras.models import Sequential
from keras.layers import Dense
# from keras.layers import Dropout

# from tensorflow.python.eager.context import context, EAGER_MODE, GRAPH_MODE
# def switch_to(mode):
#     ctx = context()._eager_context
#     ctx.mode = mode
#     ctx.is_eager = mode == EAGER_MODE

# switch_to(GRAPH_MODE)

model = Sequential([
  Dense(10, input_shape=(X.shape[1],)),  # must declare input shape
  Dense(49)
])

# model = Sequential()

# input: 784, output: 512 => 784 x 512 weights + 512 bias
# (512 neurons)
# model.add(Dense(16, activation='relu', input_shape=(X.shape[1],)))

# Add fully connected layer with a ReLU activation function
# model.add(Dense(8, activation='relu'))
# model.add(Dropout(0.2))

# input: 512, output: 512 => 512 x 512 weights + 512 bias
# (512 neurons)
#model.add(Dense(512, activation='relu'))
# model.add(Dropout(0.2))

# input: 512, output: 10 => 512 x 10 weights
# (10 neurons)
# softmax converts a set of outputs to probabilities that add up to 1
# model.add(Dense(49, activation='softmax'))

model.summary()
# Param # is W + bias
# Dense: input_shape x output_shape + output_shape
#  (where input_shape = previous layer's output_shape)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                80        
_________________________________________________________________
dense_2 (Dense)              (None, 49)                539       
Total params: 619
Trainable params: 619
Non-trainable params: 0
_________________________________________________________________


In [30]:
from keras import backend as K

#switch_to(EAGER_MODE)
def multi_targets_scorer_function(y_true, y_pred):    
    y_true = tf.map_fn(lambda x: x, mtr.getTargets())
#    tf.map_fn(mtr.getTargets())
    y_pred = tf.map_fn(lambda x: x, y_pred)

    y_true_excluding_zeros = y_true ; #[np.array(v)[np.array(v)!=0] for v in tf.map_fn(lambda x: x, y_true)]

    l = zip(tf.map_fn(lambda x: x, y_true_excluding_zeros), tf.map_fn(lambda x: x, y_pred))
    matched_index = [t.__contains__(p) for (t,p) in tf.map_fn(lambda x,y:  l)]
    print(sum(matched_index)/len(y_true_excluding_zeros), ' ', sum(sample_weight))
    return sum(matched_index)/len(y_true_excluding_zeros)


In [31]:
from keras.callbacks import TensorBoard
from keras.optimizers import RMSprop, Adam
import time
from keras.callbacks import EarlyStopping, ModelCheckpoint


batch_size = 128
num_classes = 10
epochs = 2000

#switch_to(EAGER_MODE)


#tensorboard = TensorBoard(log_dir='./logs/mnist_mlp/%d' % time.time())
model.compile(
#               loss='sparse_categorical_crossentropy',
              loss = (multi_targets_scorer_function),
              optimizer=Adam(1e-3),
#              metrics=['accuracy']
#              metrics=multi_targets_scorer_function
             ) # Tensorboard will display
                                    # acc in addition to loss

#switch_to(GRAPH_MODE)
    # Set callback functions to early stop training and save the best model so far
callbacks = [EarlyStopping(monitor='val_loss', patience=2),
             ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]

history = model.fit(X_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    callbacks=callbacks, # Early stopping
                    verbose=1)
#                    callbacks=[tensorboard],
#                    validation_data=(X_test, y_test))

TypeError: map_fn() missing 1 required positional argument: 'elems'

In [None]:
for layer in model.layers:
    print(layer.get_config())
    print(layer.get_weights())

## Predictions

In [None]:
# for display, un-flatten to 28x28
plt.imshow(X_test[7].reshape(28, 28))

# argmax converts one-hot to the value (which is the maximum index)
# [0 .... 0 1] => 9 (9 is the 9th index in the one-hot array)
print(y_test[7].argmax())

# need the flattened (784) shape for predict because the model
# expects it

In [None]:
# before feeding into Keras, we need to reshape
# input into (batch_index, 784)

# Typical error when forgetting to reshape:
#
# ValueError: Error when checking input: expected dense_7_input 
# to have shape (784,) but got array with shape (1,)
#

In [None]:
# reshape to (1, anything)
pred = model.predict(X_test[7].reshape(1, -1)) # can also .reshape(1, 784)
pred.argmax()

In [None]:
model.predict_classes(X_test[7].reshape(1, -1))

## Metrics

In [None]:
y_pred = model.predict_classes(X_test) # return labels so that
                                       # sklearn metrics work
y_pred

In [None]:
# Truth needs to be converted from one-hot to labels again
# so that sklearn metrics work
y_test.argmax(axis=1) # column-wise, axis=1 (10 columns)

In [None]:
print(classification_report(y_test.argmax(axis=1), y_pred))