Work in progress:
- add tensorboard support (see https://stackoverflow.com/questions/47818822/can-i-use-tensorboard-with-google-colab)
- retrain / fine-tune network based on pretrained weights

![Toronto AI](https://i.imgur.com/diILtDP.png)

# Neural Networks

A trained neural net can be thought of as a learned mapping.

Here are some examples of mappings that a neural net could learn:

*   Mapping English to French
*   Mapping pictures to text descriptions
*   Mapping live sensor data from a reusable rocket to control commands that land it
*   Mapping random vectors into images of flowers

In essence, we use neural nets to map one distribution of data onto another.

Here's an an example where I trained a neural net to map random vectors onto the space of flower photos, using a Generative Adversarial Network:

![](https://i.imgur.com/SaT9OEM.png)

# Tensors

* Tensors are multidimensional arrays.
* They are like boxes of data, that we use to contain our data, or the weights of our model.
* Tensors are used extensively in TensorFlow to represent:
  * 0-D - scalars
  * 1-D - vectors, text
  * 2-D - matrices, tables of data
  * 3-D - batches of matrices, a cube of data, e.g. an image, a monochrome video
  * 4-D - convolution kernels, a colour video
  * 5-D - batches of colour video
  * 6-D - 3D vector fields
  * 7-D - layered 3D vector fields (e.g. gravity and electromagnetism layers)
  * 8-D - batches of layered 3D vector fields 
  * 9-D - batches of layered 3D vector fields evolving through time
  * keep going...

* GPU memory is expensive, so Tensors are most commonly 4-D or less.

* It helps to visualize a Tensor as a Rubiks Cube - each cell holds a piece of scalar data (like a weight, a piece of input data, or a label).  For higher dimensional Tensors, think of each cell as holding a Tensor instead of a scalar.
![](https://i.imgur.com/KyOQVX9.png)

This notebook is based on Toronto AI - Noise-Circle Synthetic Dataset https://github.com/toronto-ai/noise-circle-dataset, modified for Google Colab environment using Keras.

In [1]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 18067462837903200340, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 11297803469
 locality {
   bus_id: 1
 }
 incarnation: 12397279698801048631
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"]

Mapping your Google Drive space to Colab

Ref: https://www.kdnuggets.com/2018/02/google-colab-free-gpu-tutorial-tensorflow-keras-pytorch.html


In [2]:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
··········
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
Please enter the verification code: Access token retrieved correctly.


In [21]:
import time
import os
import json
import datetime

!mkdir -p drive
!google-drive-ocamlfuse drive

# use mapped google drive url instead
#DIRECTORY = os.path.join("~", "ai")
DIRECTORY = '/content/drive/ai' # specific to google colab environment

# check the drive is properly mapped
!ls /content/drive

ai		     cv		   hackability - aidex	test
aidexDriveLocal.zip  dnd	   id_rsa_art3mis.pub	Welcome to Coda
app		     dsb2018	   notepad		Workspace
Colab Notebooks      Family Photo  PEO-MC 2018


In [4]:
# https://keras.io/
!pip install -q keras

from __future__ import print_function
import keras
from keras.models import Sequential
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten, Lambda
from keras.layers import Conv2D, MaxPooling2D, AveragePooling2D
from keras.losses import mean_squared_error
from keras.optimizers import  Adam
from keras.callbacks import Callback, ModelCheckpoint
from keras.models import model_from_json
from keras import backend as K

Using TensorFlow backend.


In [0]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import random, math
from matplotlib import animation, rc
from IPython.display import HTML

In [0]:
plt.rcParams['image.cmap'] = 'viridis'

np.random.seed(20180118)

BATCH_SIZE = 16
DIM = 64
TWO_PI = 2.*math.pi
MIN_CIRCLE_WIDTH = int(DIM/10)
NUM_CIRCLE_DOTS = 500
CIRCLE_WIDTH = 3
DARKNESS = .5

DISPLAY_RATE = 100

In [0]:
# We'll use this to create our data set.
def add_circles(data):
    radius = int((random.random() * (DIM/2 - MIN_CIRCLE_WIDTH)) + MIN_CIRCLE_WIDTH)
    xpos = random.random()*DIM
    ypos = random.random()*DIM
    
    draw_circle(data, xpos, ypos, radius, DARKNESS, CIRCLE_WIDTH)

    return [xpos, ypos, radius+CIRCLE_WIDTH/2]


def draw_circle(data, xpos, ypos, radius, darkness, width):
    for i in range(NUM_CIRCLE_DOTS):
        for r in range(radius, radius+width):
            rad = TWO_PI * i/NUM_CIRCLE_DOTS
            x = int(round(r*math.cos(rad)+xpos))
            y = int(round(-r*math.sin(rad)+ypos))
            if x >= 0 and x < DIM and y >= 0 and y < DIM:
                data[x,y] = data[x,y] - darkness

# Create random noise and draw circles in it
def create_dataset_row():
    data = np.random.random((DIM, DIM)).astype(np.float32)
    label = add_circles(data)
    label = np.array(label).astype(np.float32)
    
    return (data, label)
    

def create_dataset(rows):
    
    labels  = []
    samples = []
    for i in range(rows):
        data, label = create_dataset_row()
        labels.append(label)
        samples.append(data)
    return (np.array(samples).astype(np.float32), np.array(labels).astype(np.float32))

def plot_dataset(data):
    fig, ax = plt.subplots()

    def init():
        ax.cla()
        return ()

    def animate(i):
        ax.imshow(data[i%len(data)])
        return ()

    anim = animation.FuncAnimation(fig, animate, init_func=init, frames=BATCH_SIZE, interval=700, blit=True)
    plt.close(fig)
    fig.set_size_inches(10, 10, True)
    return HTML(anim.to_jshtml())


In [8]:
# Create an animation so we can see our data set
dataset = create_dataset(BATCH_SIZE)
data   = dataset[0]
labels = dataset[1]

plot_dataset(data)

# GOALS

* Predict the radius of the circle
* Predict the center of the circle

We are going to create a neural net to solve this problem for us, since it would be very difficult to solve this with conventional code.

![](https://i.imgur.com/o2qIsu4.png)

Build the Keras Model here

input: BATCH_SIZE x 64 x 64 x 1

TRUNK:
C1 - CONV: 32 filters, 7x7 kernal, relu
C2 - CONV: 32 filters, 7x7 kernal, relu
C3 - CONV: 32 filters, 7x7 kernal, relu

C4 - CONV: 32 filters, 7x7 kernal, relu
A4 - AVG POOLING: 2x2, strides 2

C5 - CONV: 32 filters, 7x7 kernal, relu 
A5 - AVG POOLING: 2x2, strides 2

C6 - CONV: 32 filters, 7x7 kernal, relu 
A6 - AVG POOLING: 2x2, strides 2

output': BATCH_SIZE x 8 x 8 x 32

HEAD (3x, one for xpos, one for ypos and one for radius)

D7x - DENSE: 32, elu
D8x - DENSE: 32, elu
D9x - DENSE: 32, elu

output: sum(D7x,D8x,D9x)

In [0]:
# Setup the input to the network

# check the CPU/GPU image convention
# based on keras mnist example
if K.image_data_format() == 'channels_first':
    input_shape = (1, DIM, DIM)
else:
    input_shape = (DIM, DIM, 1)
    
#main_input = Input(shape=input_shape, dtype='float32', name='main_input')

# Batch version
main_input = Input(batch_shape=(BATCH_SIZE,)+input_shape, dtype='float32', name='main_input')


# Convolutional Layers


* a convolutional layer is used to extract 'feature maps' from the previous layer.
* feature maps are called 'channels' in TensorFlow.
* Channels are familiar.  Images often have three feature maps: the Red, Green and Blue channels.

* You can stack convolution layers.  Learn basic features from an image in one layer, then learn features of those features in the next layer.

* Convolutional layers can be employed to learn dozens of feature maps off the data.


![](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)
_Image credits: Aphex34 (Wikimedia Commons)_





### Convolutional layers learn feature maps, one for each filter.
* Each filter acts like a small window through which it can read a specific feature from each channel of the input.
* Each filter passes across the entire image and at each position combines the features detected in the input channels into an output channel.
* A convolutional layer with many filters will have an output channel for each filter, holding the results.

### Activation functions
* To add 'depth' to our convolutional layer (i.e. the depth in deep learning), we need to add a non-linearity to the output, called an activation function.
* We'll use a Leaky ReLu - it's nonlinear and simple.  ELUs are a great choice as well.

<table>
    <tr>
        <th>tf.nn.leaky_relu</th><td><img src="https://i.imgur.com/KxYFRIL.png =100x100" alt="Drawing" style="width: 200px;"/></td>
    </tr>
    <tr><th>Other activation functions</th><td>
        <p><code>tf.nn.relu</code> <i>(dense, convolution)</i></p>
            <p><code>tf.nn.elu</code> <i>(dense, convolution)</i></p>
            <p><code>tf.nn.softplus</code> <i>(dense, convolution)</i></p>
            <p><code>tf.nn.sigmoid</code> <i>(0 to 1 classifier)</i></p>
            <p><code>tf.nn.tanh</code> <i>(LSTM, -1 to 1 classifier)</i></p>
    </td></tr>
</table>


Further reading on activation functions:
* https://arxiv.org/abs/1505.00853 - Empirical Evaluation of Rectified Activations in Convolutional Network
* https://arxiv.org/abs/1709.06247 - Training Better CNNs Requires to Rethink ReLU

# Pooling

* Shrink a layer
* Using a stride of two will halve the width and height.
* The pool_size should be at least as large as the strides

In [0]:
# number of filters per conv2d layers
FS = 32

# conv2d kernel window size
KerSz  = 7

#Ref:
# keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', 
#   data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, 
#   kernel_initializer='glorot_uniform', bias_initializer='zeros', 
#   kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, 
#   kernel_constraint=None, bias_constraint=None)
conv = Conv2D(FS, (KerSz, KerSz), padding='same',activation='relu',kernel_initializer='glorot_normal')

# keras.layers.AveragePooling2D(pool_size=(2, 2), strides=None, padding='valid', data_format=None)
avgp = AveragePooling2D(pool_size=(2, 2), strides=(2,2))
maxp = MaxPooling2D(pool_size=(2, 2), strides=(2,2))


# Creating our graph

## Subgraph: The convolutional layers

* We are stacking our convolutional layers, so that later layers detect features on lower layers.
* Higher layers learn higher level features from lower layers that learn lower level features.

In [0]:
# C1
# without the input_shape at first conv2d it won't work!
x = Conv2D(FS, (KerSz, KerSz), padding='same',activation='relu', 
           input_shape=input_shape)(main_input)

x = conv(x) # C2
x = conv(x) # C3

x = conv(x) # C4
x = avgp(x) # A4 (swap out with maxp if you want)

x = conv(x) # C5
x = avgp(x) # A5 (swap out with maxp if you want)

x = conv(x) # C6
x = avgp(x) # A6 (swap out with maxp if you want)

# flattern the conv2d layer output to make it a table, one row for each example 
# in the batch.
xf = Flatten()(x)

In [0]:
# make lambda layers between the trunk and head
# https://keras.io/layers/core/
# stackoverflow: https://github.com/keras-team/keras/issues/890
# keras.layers.Lambda(function, output_shape=None, mask=None, arguments=None)
#
# split the conv2d layer output for x,y,r estimation branches

#a_third  = int(x.shape[1].value / 3)
a_third = 682 # hard coded (TODO: fix this!!!)

xx = Lambda(lambda x: x[:,0*a_third:(0+1)*a_third])(xf)
yy = Lambda(lambda x: x[:,1*a_third:(1+1)*a_third])(xf)
rr = Lambda(lambda x: x[:,2*a_third:(2+1)*a_third])(xf)

Subgraph: The fully connected dense layers
These layers are used to convert the tensor that was output from the convolutional layers down into a prediction.
In our case, we want 3 outputs

The X and Y coordinate of the center of the circle
The radius of the circle
Here we're dividing the channels into thirds, and we have attached one subnet of fully connected layers to each output.

We are asking the neural net that the sum of the outputs of each subnet is the corresponding prediction (for x, y, radius)

In [0]:
# feed the input to the sub networks for regression (e.g.x_guess, y_guess etc.)

# number of neurons in the hidden layers
DS = 32 
af = 'elu'

# Reference:
# keras.layers.Dense(units, activation=None, use_bias=True, 
#   kernel_initializer='glorot_uniform', bias_initializer='zeros', 
#   kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, 
#   kernel_constraint=None, bias_constraint=None)

def denseSubnet(name,input):
  layer = Dense(DS,activation=af,kernel_initializer='glorot_normal')(input)
  layer = Dense(DS,activation=af,kernel_initializer='glorot_normal')(layer)
  layer = Dense(DS,activation=af,kernel_initializer='glorot_normal')(layer)
  # instead of using sum (which doesn't work here)
  #   return Lambda(lambda x: K.sum(x, axis=1))(layer)
  layer = Dense(1,activation='linear',name=name)(layer) 
  return layer

x_guess = denseSubnet('x_guess',xx)
y_guess = denseSubnet('y_guess',yy)
r_guess = denseSubnet('r_guess',rr)

# try without spliting up the conv2d output later
#x_guess = denseSubnet('x_guess',xf)
#y_guess = denseSubnet('y_guess',xf)
#r_guess = denseSubnet('r_guess',xf)


In [14]:
model = Model(inputs=[main_input], outputs=[x_guess,y_guess,r_guess])

# display the model
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
main_input (InputLayer)         (16, 64, 64, 1)      0                                            
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (16, 64, 64, 32)     1600        main_input[0][0]                 
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               multiple             50208       conv2d_2[0][0]                   
                                                                 conv2d_1[0][0]                   
                                                                 conv2d_1[1][0]                   
                                                                 average_pooling2d_1[0][0]        
          

# Objective function (a.k.a. the loss function)

* The objective function is what the system attempts to minimize
* The most important thing to remember is that the loss function needs to be differentiable with a minimum value at your goal.
* Convex functions are easy to minimize.

### Common objectives
* Minimizing the difference of squares (a.k.a. mean squared error)

* Minimizing the log loss - this is useful in classification tasks when dealing with probabilities.

In [0]:
# Ref:
# https://github.com/keras-team/keras/blob/master/keras/losses.py
def loss_metric_helper(y_true, y_pred):
  actual_x, actual_y, actual_r = y_true[0], y_true[1], y_true[2]
  x,y,r = y_pred[0], y_pred[1], y_pred[2]
  
  # Define an error between the predicted x, y and radius, and their actual values
  # We will later ask the network to minimize this error through gradient descent
  err = mean_squared_error(y_true, y_pred)
  x_error, y_error, r_error = err[0],err[1],err[2]
  
  return x_error, y_error, r_error

def loss_function(y_true, y_pred):
  x_error, y_error, r_error = loss_metric_helper(y_true, y_pred)
  
  # Let's combine these individual loss metrics into a combined loss function
  # Note: these are vectors - with separate losses defined for each element in the batch 
  return x_error + y_error + r_error

def avg_dist_metric(y_true, y_pred):
  x_error, y_error, _ = loss_metric_helper(y_true, y_pred)
    
  # Now lets create some metrics that we can use to evaluate our progress
  avg_distance_from_actual_center = x_error + y_error
  return avg_distance_from_actual_center

def avg_radius_error_metric(y_true, y_pred):
  _, _, r_error = loss_metric_helper(y_true, y_pred)
  avg_radius_error = r_error
  return avg_radius_error

# Optimizer

There are many choices for optimizers.
For most applications, the Adam optimizer will give you good flexibility and fast training.

## Adam

* The adam optimizer is a gradient descent optimization alogirthm that adds two things:
  * First, it adds momentum to each weight of your model to help it descend.
  * Second, it slows down weights proportionally to how much they are oscillating 
  
* Both of the effects of the Adam optimizer have an exponential decay built in.  These are parameters to the optimizer.
  * alpha - The learning rate.  Typical values are 0.0003 to to 0.000003
  * beta1 - The decay rate of the momentum term.  Typical values are 0.5 to 0.9.
  * beta2 - The decay rate of the variance term.  Typical values are 0.9 to 0.999



Further reading: 
* https://arxiv.org/abs/1412.6980 Adam: A Method for Stochastic Optimization
* http://ruder.io/optimizing-gradient-descent/ An overview of gradient descent optimization algorithms


In [0]:
# setup the optimizer
# ref: https://keras.io/optimizers/
# keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False

ALPHA = 0.001
BETA1 = 0.9
BETA2 = 0.999
EPSILON=None #1e-9

opt = Adam(lr=ALPHA,beta_1=BETA1,beta_2=BETA2,epsilon=EPSILON)


# Connecting the model pieces together

In [0]:
# try to compile the model
model.compile(loss=loss_function, 
              optimizer=opt,
              metrics=[avg_dist_metric, avg_radius_error_metric])

# Preparing a Training run

Define the batch sample generator to generate an infinite dataset in Keras convention

Ref:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly.html


In [0]:
class DataGenerator(object):
  def __init__(self, dim_x = 32, dim_y = 32, dim_z = 32, batch_size = 32, shuffle = True):
    'Initialization'
    self.dim_x = dim_x
    self.dim_y = dim_y
    self.batch_size = batch_size
    self.shuffle = shuffle
    
  def generate(self):
    'Generates batches of samples'
    # Infinite loop
    while 1:
      # Generate order of exploration of dataset
      # TODO: check for channel order and adj the order
      samples = np.zeros((BATCH_SIZE,DIM,DIM,1))
      ds = create_dataset(BATCH_SIZE)
      samples[:,:,:,0] = ds[0]
      labels = ds[1]
      yield (samples,[labels[:,0],labels[:,1],labels[:,2]])

  def generateOne(self):
    # one sample batch output
    samples = np.zeros((BATCH_SIZE,DIM,DIM,1))
    ds = create_dataset(BATCH_SIZE)
    samples[:,:,:,0] = ds[0]
    labels = ds[1]
    return (samples,[labels[:,0],labels[:,1],labels[:,2]])
      
# Parameters
params = {'dim_x': DIM,
          'dim_y': DIM,
          'batch_size': BATCH_SIZE,
          'shuffle': False}

generator = DataGenerator(**params).generate()      

Define callbacks for misc. purpose


In [0]:
# Ref:
# https://keras.io/callbacks/#example-recording-loss-history
class LossHistory(keras.callbacks.Callback):
  def __init__(self):
    self.idx = 0
    self.ema_radius = 0
    self.ema_center = 0
    
  def on_train_begin(self, logs={}):
    self.losses = []

  def on_batch_end(self, batch, logs={}):
    #self.losses.append(logs.get('metric'))  
    xErr = logs.get('x_guess_avg_dist_metric')
    yErr = logs.get('y_guess_avg_dist_metric')
    rErr = logs.get('r_guess_avg_radius_error_metric')
    
    self.ema_radius += rErr
    self.ema_center += (xErr + yErr)/2

    if not (self.idx % DISPLAY_RATE):
      self.ema_radius /= DISPLAY_RATE
      self.ema_center /= DISPLAY_RATE
      print("step: {step}, \
        EMA radius error: {ema_radius:.0f}px. \
        EMA center error: {ema_center:.0f}px".format(step=self.idx, 
                                                     ema_radius=self.ema_radius, 
                                                     ema_center=self.ema_center))
    self.idx += 1
    
  def on_train_end(self, logs={}):
    wfn = 'model_weight_' + datetime.datetime.now().strftime("%y-%m-%d-%H-%M") + '.hdf5'
    model.save_weights(DIRECTORY+'/'+wfn)

    jfn = 'model_' + datetime.datetime.now().strftime("%y-%m-%d-%H-%M") + '.json'
    with open(DIRECTORY+'/'+jfn, 'w') as outfile:
      json_string = model.to_json()
      json.dump(json_string, outfile)
      
    print("Model saved: {}".format(wfn))


# Start Training!

train the model with generator

In [29]:
np.random.seed(20180118)

history = LossHistory()

# load existing weight if exist
model.load_weights(DIRECTORY+'/model_weight_18-03-06-19-35.h5', by_name=False)

start_time = time.time()
model.fit_generator(generator = generator,
                    steps_per_epoch=1, epochs=1000,
                    workers = 0 , # This is important
                    verbose = 0,
                    callbacks=[history])
print("--- %s seconds ---" % round(time.time() - start_time, 2))

step: 0,         EMA radius error: 0px.         EMA center error: 2px
step: 100,         EMA radius error: 9px.         EMA center error: 48px
step: 200,         EMA radius error: 8px.         EMA center error: 42px
step: 300,         EMA radius error: 10px.         EMA center error: 54px
step: 400,         EMA radius error: 7px.         EMA center error: 63px
step: 500,         EMA radius error: 7px.         EMA center error: 24px
step: 600,         EMA radius error: 5px.         EMA center error: 23px
step: 700,         EMA radius error: 4px.         EMA center error: 20px
step: 800,         EMA radius error: 3px.         EMA center error: 16px
step: 900,         EMA radius error: 2px.         EMA center error: 8px
Model saved: model_weight_18-03-06-23-01.hdf5
--- 186.12 seconds ---


# Visualizing the Results!

In [30]:
sample_batch, actual = DataGenerator(**params).generateOne()
prediction = model.predict(sample_batch,batch_size=BATCH_SIZE) # this seems to work ...

xp = prediction[0].reshape(-1,).astype(np.int32)
yp = prediction[1].reshape(-1,).astype(np.int32)
rp = prediction[2].reshape(-1,).astype(np.int32)
xa, ya, ra      = [np.rint(a).astype(np.int32) for a in actual]

result_display = []

# Add the prediction and the actual to the dataset
for i, sample in enumerate(sample_batch):
    draw_circle(sample, xa[i], ya[i], ra[i],  DARKNESS, CIRCLE_WIDTH)
    draw_circle(sample, xp[i], yp[i], rp[i], -DARKNESS, CIRCLE_WIDTH)  # The prediction will appear bright
    result_display.append(sample)

plot_dataset(np.array(result_display)[:,:,:,0])

Save/Exportthe model config

Ref: https://keras.io/models/about-keras-models/

In [32]:
wfn = 'model_weight_' + datetime.datetime.now().strftime("%y-%m-%d-%H-%M") + '.hdf5'
model.save_weights(DIRECTORY+'/'+wfn)

jfn = 'model_' + datetime.datetime.now().strftime("%y-%m-%d-%H-%M") + '.json'
with open(DIRECTORY+'/'+jfn, 'w') as outfile:
  json_string = model.to_json()
  json.dump(json_string, outfile)

# confirm file saved/exported
!ls /content/drive/ai

OSError: ignored