In [1]:
import os
import numpy as np

import tensorflow as tf
from tensorflow.python.keras.datasets import boston_housing
from tensorflow.contrib.eager.python import tfe


  from ._conv import register_converters as _register_converters


In [2]:
# Enable Eager exeution Mode
tf.enable_eager_execution()
tf.set_random_seed(0)
np.random.seed(0)

In [3]:
# constants
batch_size = 128
epochs = 25

In [4]:
# dataset loading
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

# normalization of dataset
mean = x_train.mean(axis=0)
std = x_train.std(axis=0)

x_train = (x_train - mean) / (std + 1e-8)
x_test = (x_test - mean) / (std + 1e-8)

print('x train', x_train.shape, x_train.mean(), x_train.std())
print('y train', y_train.shape, y_train.mean(), y_train.std())
print('x test', x_test.shape, x_test.mean(), x_test.std())
print('y test', y_test.shape, y_test.mean(), y_test.std())

x train (404, 13) 2.6029783389231392e-15 0.9999999879626582
y train (404,) 22.395049504950492 9.199035423364862
x test (102, 13) 0.020826991529340172 0.9836083314719052
y test (102,) 23.07843137254902 9.123806690181466


# Create the Keras Model

The canonical way to define our models in Eager mode is to extend tf.keras.Model, which will manage all of the Keras layers or other Keras Models that are added to this class.

An idiosyncrasy of how this model's layers is managed is that when you build the model and call Model.summary(), **the layer names will be in the order of how they were assigned inside `__init__`, NOT the way they are called inside `call()`**. 

Another issue is that it assumes you know how many layers you will need before `call`. This may not be the case in certain instances, like ResNet or Inception, where you can have more layers to increase depth. This can be managed, by using a `setattr(self, 'some_unique_key_name', layer)` dynamically. This technique will be shown later in `05_inception.py` and `05_resnet.py`

In [None]:
# model definition (canonical way)
class Regressor(tf.keras.Model):

    def __init__(self):
        super(Regressor, self).__init__()
        self.dense = tf.keras.layers.Dense(1)

    def call(self, inputs, training=None, mask=None):
        output = self.dense(inputs)
        return output

# Training
The benefit of extending a tf.keras Model, is that you can use **all of the utility functions provided by Keras** just as you would use them normally.

However, there is currently a minor bug in TF <1.8 where if you call Model.fit() right after building it, it will **try to pass the entire dataset (X and Y which is passed to `.fit()` as arguments) to determine the shapes of the model's layers**. 

While lazy layer building is generally great, and it bypasses PyTorch's requirement to know the input dimentions for every layer, it causes a severe problem if you try to pass a dataset such as MNIST (60000, 784) as a single "batch" to determine the shape. Almost all small GPUs will choke and raise an OOM error at that point. For larger models with hundreds of layers, even a 1080Ti will take a long time to handle that.

Fortunately, there are several easy possible fixes : 

- Use `Model.fit_generator()` instead of `Model.fit()`. Since the generator will only pass the first batch to determine the shape, it wont cause an issue.
- Call the model explictly using `Model._set_inputs()` with a dummy tensorflow batch containing a single sample of the shape of the dataset (for MNIST, that is (1, 784)) to force build the model. I'll be using this throughout, since it makes more sence than writing generators for `fit_generator` for small datasets like MNIST, Fasion MNIST and Boston Housing.
- Write your own training loop. But we are using a Keras Model, why bother with that unless we absolutely have to?

# Note on building models
It is best to build the model once after creating using an explicit `Model._set_inputs()` and passing it a tensorflow batch prior to performing any task - `Model.fit()/evaluate()/predict()` and prior to loading a trained model checkpoint. 

Generally, always build your model once before doing **anything.**

In [6]:
device = '/cpu:0' if tfe.num_gpus() == 0 else '/gpu:0'

with tf.device(device):
    # build model and optimizer
    model = Regressor()
    model.compile(optimizer=tf.train.GradientDescentOptimizer(0.1), loss='mse')

    # suggested fix for TF <= 2.0; can be incorporated inside `_eager_set_inputs` or `_set_input`
    # Fix = Use exactly one sample from the provided input dataset to determine input/output shape/s for the model
    dummy_x = tf.zeros((1, 13))
    model._set_inputs(dummy_x)

    # train
    model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,
              validation_data=(x_test, y_test))

    # evaluate on test set
    scores = model.evaluate(x_test, y_test, batch_size, verbose=2)
    print("Test MSE :", scores)

Train on 404 samples, validate on 102 samples
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Test MSE : 22.48859977722168
