<a href="https://colab.research.google.com/github/zhenya-mamenko/mini-ML-piscine/blob/master/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2017 Google LLC., 2019 Zhenya Mamenko
This notebook based on [First Steps with TensorFlow](https://colab.research.google.com/notebooks/mlcc/first_steps_with_tensor_flow.ipynb?utm_source=zhenya-mamenko&utm_campaign=colab-external&utm_medium=referral&utm_content=firststeps-colab&hl=en) exercise from [Google Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/).

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# First Steps with TensorFlow

**Learning Objectives:**
  * Learn fundamental TensorFlow 2.0 and Keras concepts
  * Use the `Sequential` model with `Dense` layer and `linear` activation function to predict median housing price, at the granularity of city blocks, based on one input feature
  * Evaluate the accuracy of a model's predictions using Root Mean Squared Error (RMSE)
  * Improve the accuracy of a model by tuning its hyperparameters

The [data](https://developers.google.com/machine-learning/crash-course/california-housing-data-description) is based on 1990 census data from California.

## Setup
In this first cell, we'll load the `tensorflow 2.0.0-beta1` and other necessary libraries.

In [0]:
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import logging
from packaging import version
from IPython.display import display
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format
logging.getLogger('tensorflow').disabled = True

!pip install -q tensorflow==2.0.0-beta1

import tensorflow as tf
from tensorflow.keras import layers

Make sure that our kernel use version 2.0 of TensorFlow. If not, reload kernel with `Ctrl+M .`

In [0]:
print(tf.__version__)
assert version.parse(tf.__version__).release[0] >= 2, \
    "This notebook requires TensorFlow 2.0 or above."

We'll use TensorBoard to visualize our work. Load extention and useful libraries.

In [0]:
# Load TensorBoard extention
%load_ext tensorboard

from datetime import datetime
import io
logging.getLogger('tensorboard').disabled = True

Next, we'll load our data set.

In [0]:
california_housing_dataframe = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_train.csv", sep=",")

We'll randomize the data, just to be sure not to get any pathological ordering effects that might harm the performance of Stochastic Gradient Descent. Additionally, we'll scale `median_house_value` to be in units of thousands, so it can be learned a little more easily with learning rates in a range that we usually use.

In [0]:
california_housing_dataframe = california_housing_dataframe.reindex(
    np.random.permutation(california_housing_dataframe.index))
california_housing_dataframe["median_house_value"] /= 1000.0

## Examine the Data

It's a good idea to get to know your data a little bit before you work with it.

We'll print out a quick summary of a few useful statistics on each column: count of examples, mean, standard deviation, max, min, and various quantiles.

In [0]:
california_housing_dataframe.describe()

## Build the First Model

In this exercise, we'll try to predict `median_house_value`, which will be our label (sometimes also called a target). We'll use `total_rooms` as our input feature.

**NOTE:** Our data is at the city block level, so this feature represents the total number of rooms in that block.

To train our model, we'll use the [Sequential](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Sequential) interface provided by the TensorFlow 2.0 [Keras](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras) API. This is high-level API for TensorFlow.

### Step 1: Define Features

There are two main types of data we'll use in this and future exercises:

* **Categorical Data**: Data that is textual. In this exercise, our housing data set does not contain any categorical features, but examples you might see would be the home style, the words in a real-estate ad.

* **Numerical Data**: Data that is a number (integer or float) and that you want to treat as a number. As we will discuss more later sometimes you might want to treat numerical data (e.g., a postal code) as if it were categorical.

In Keras, we don't need any special preparations of our data, just get NumPy arrays from dataset.

To start, we're going to use just one numeric input feature, `total_rooms`. The following code pulls the `total_rooms` data from our `california_housing_dataframe`:

In [0]:
# Define the input feature: total_rooms.
features = california_housing_dataframe[["total_rooms"]].values

**NOTE:** The shape of our `total_rooms` data is a one-dimensional array (a list of the total number of rooms for each block).

### Step 2: Define the Label

Next, we'll define our label, which is `median_house_value`. Again, we can pull it from our `california_housing_dataframe`:

In [0]:
# Define the label.
labels = california_housing_dataframe["median_house_value"].values

### Step 3: Configure the Model

Next, we'll configure a simple Keras model using `Sequential` class. We'll use only one [`Dense`](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/Dense) layer and than we'll train this model using the [Stohastic Gradient Descent](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/optimizers/SGD) optimizer `SGD`. The `learning_rate` argument controls the size of the gradient step.

**NOTE:** To be safe, we also apply [gradient clipping](https://developers.google.com/machine-learning/glossary/#gradient_clipping) to our optimizer via `clipnorm` parameter. Gradient clipping ensures the magnitude of the gradients do not become too large during training, which can cause gradient descent to fail. 

In [0]:
# Use linear activation function to emulate LinearRegressor estimator from TF1
model = tf.keras.models.Sequential([
    layers.Dense(1, activation='linear', kernel_initializer='random_normal')
])

# Use gradient descent as the optimizer for compiile the model and MSE metric.
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.0000001, clipnorm=5.0),
              loss='mse',
              metrics=[tf.keras.metrics.RootMeanSquaredError()])


### Step 4: Train the Model

We can now call `fit()` on our `model` to train it, and to start, we'll train for 100 steps.

In [0]:
history = model.fit(features, labels, epochs=1, steps_per_epoch=100, batch_size=1, verbose=0).history
root_mean_squared_error = history["root_mean_squared_error"][0]
mean_squared_error = history["loss"][0]
print("Mean Squared Error (on training data): %0.3f" % mean_squared_error)
print("Root Mean Squared Error (on training data): %0.3f" % root_mean_squared_error)

**NOTE:** Training error measures how well your model fits the training data, but it **_does not_** measure how well your model **_generalizes to new data_**. In later exercises, you'll explore how to split your data to evaluate your model's ability to generalize.


Is this a good model? How would you judge how large this error is?

Mean Squared Error (MSE) can be hard to interpret, so we often look at Root Mean Squared Error (RMSE)
instead.  A nice property of RMSE is that it can be interpreted on the same scale as the original targets.

Let's compare the RMSE to the difference of the min and max of our targets:

In [0]:
min_house_value = california_housing_dataframe["median_house_value"].min()
max_house_value = california_housing_dataframe["median_house_value"].max()
min_max_difference = max_house_value - min_house_value

print("Min. Median House Value: %0.3f" % min_house_value)
print("Max. Median House Value: %0.3f" % max_house_value)
print("Difference between Min. and Max.: %0.3f" % min_max_difference)
print("Root Mean Squared Error: %0.3f" % root_mean_squared_error)

Our error is too big concerning to the target values. Can we do better?

This is the question that nags at every model developer. Let's develop some basic strategies to reduce model error.

The first thing we can do is take a look at how well our predictions match our targets, in terms of overall summary statistics.

In [0]:
calibration_data = pd.DataFrame()
calibration_data["predictions"] = model.predict_on_batch(california_housing_dataframe["total_rooms"].values).flatten()
calibration_data["targets"] = pd.Series(labels)
calibration_data.describe()

Okay, maybe this information is helpful. How does the mean value compare to the model's RMSE? How about the various quantiles?

We can also visualize the data and the line we've learned.  Recall that linear regression on a single feature can be drawn as a line mapping input *x* to output *y*.

First, we'll get a uniform random sample of the data so we can make a readable scatter plot.

In [0]:
sample = california_housing_dataframe.sample(n=300)


Next, we'll plot the line we've learned, drawing from the model's bias term and feature weight, together with the scatter plot. The line will show up red.
We need helper function for saving our plots to TensorBoard.

In [0]:
def plot_to_image(figure):
  """Converts the matplotlib plot specified by 'figure' to a PNG image and
  returns it. The supplied figure is closed and inaccessible after this call."""
  # Save the plot to a PNG in memory.
  buf = io.BytesIO()
  plt.savefig(buf, format='png')
  # Closing the figure prevents it from being displayed directly inside
  # the notebook.
  plt.close(figure)
  buf.seek(0)
  # Convert PNG buffer to TF image
  image = tf.image.decode_png(buf.getvalue(), channels=4)
  # Add the batch dimension
  image = tf.expand_dims(image, 0)
  return image

In [0]:
# Some magic: remove old plots

!rm -rf logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard/plots

In [0]:
# Get the min and max total_rooms values.
x_0 = sample["total_rooms"].min()
x_1 = sample["total_rooms"].max()

# Retrieve the final weight and bias generated during training.
weight, bias = [x.flatten()[0] for x in model.layers[0].get_weights()]

# Get the predicted median_house_values for the min and max total_rooms values.
y_0 = weight * x_0 + bias 
y_1 = weight * x_1 + bias

# Plot our regression line from (x_0, y_0) to (x_1, y_1).
plt.plot([x_0, x_1], [y_0, y_1], c='r')

# Label the graph axes.
plt.ylabel("median_house_value")
plt.xlabel("total_rooms")

# Plot a scatter plot from our data sample.
plt.scatter(sample["total_rooms"], sample["median_house_value"])

logdir = "logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard/plots/" + datetime.now().strftime("%Y%m%d-%H%M%S")
file_writer = tf.summary.create_file_writer(logdir)
figure = plt.figure(1)
with file_writer.as_default():
  tf.summary.image("Learned Line over scatter plot from sample data",
                   plot_to_image(figure),
                   step=0)

Run TensorBoard to show our graph.

In [0]:
%tensorboard --logdir logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard/plots

This initial line looks way off.  See if you can look back at the summary stats and see the same information encoded there.

Together, these initial sanity checks suggest we may be able to find a much better line.

## Tweak the Model Hyperparameters
For this exercise, we've write function for training model. You can call the function with different parameters to see the effect.

In this function, we'll proceed in 10 epochs so that we can observe the model improvement at each epoch.

For each epoch, we'll graph training loss.  This may help you judge when a model is converged, or if it needs more iterations.
We'll also plot the feature weight and bias term values learned by the model over time.  This is another way to see how things converge.
To do this, we'll implement simple [Lambda](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/callbacks/LambdaCallback) [callback](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/callbacks).

We'll also use [TensorBoard](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/callbacks/TensorBoard) callback to log our RMSE and MSE.

In [0]:
!rm -rf logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard/

In [0]:
def fit_model(learning_rate=0.001, steps_per_epoch=100, batch_size=1, input_feature="total_rooms"):
  """Trains a linear regression model of one feature.
  
  Args:
    learning_rate: A `float`, the learning rate.
    steps_per_epoch: A non-zero `int`, the total number of training steps. A training step
      consists of a forward and backward pass using a single batch.
    batch_size: A non-zero `int`, the batch size.
    input_feature: A `string` specifying a column from `california_housing_dataframe`
      to use as input feature.
  """
  epochs = 10
  features = california_housing_dataframe[[input_feature]].values
  label = "median_house_value"
  labels = california_housing_dataframe[label].values
  
  model = tf.keras.models.Sequential([
      layers.Dense(1, activation='linear', kernel_initializer='random_normal', bias_initializer='random_normal')
  ])
  model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=learning_rate, clipnorm=5.0),
                loss='mse',
                metrics=[tf.keras.metrics.RootMeanSquaredError()])
  
  sample = california_housing_dataframe.sample(n=300)
  logdir = "logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard/plots" + datetime.now().strftime("%Y%m%d-%H%M%S")
  scalars_logdir = "logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard/scalars" + datetime.now().strftime("%Y%m%d-%H%M%S")
  file_writer = tf.summary.create_file_writer(logdir)
  
  # Set up to plot the state of our model's line each epoch.
  def create_plt_params(feature, label, epochs=10):
    colors = [cm.coolwarm(x) for x in np.linspace(-1, 1, epochs)]
    return (colors,
            (sample[feature].min(), sample[feature].max()),
            (0, sample[label].max()))
    
  def create_figure(feature, label, epochs=10):
    figure = plt.figure(figsize=(15, 6))
    plt.title("Learned Line by Epoch")
    plt.ylabel(label)
    plt.xlabel(feature)
    plt.scatter(sample[feature], sample[label])
    return figure

  colors, x_min_max, y_min_max = create_plt_params(input_feature, label, epochs)

  def log(epoch, logs):
    root_mean_squared_error = logs["root_mean_squared_error"]
    print("  epoch %02d : %0.2f" % (epoch, root_mean_squared_error))

    weight, bias = [x.flatten()[0] for x in model.layers[0].get_weights()]

    # Apply some math to ensure that the data and line are plotted neatly.
    y_extents = np.array(y_min_max)
    x_extents = (y_extents - bias) / weight
    x_extents = np.maximum(np.minimum(x_extents,
                                      x_min_max[1]),
                           x_min_max[0])
    y_extents = weight * x_extents + bias
    figure = create_figure(input_feature, label, epochs)
    plt.plot(x_extents, y_extents, color=colors[epoch]) 
    with file_writer.as_default():
      tf.summary.image("Learned Line by Epoch",
                       plot_to_image(figure),
                       step=epoch)
      
  model_callback = tf.keras.callbacks.LambdaCallback(
      on_epoch_end=lambda epoch, logs: log(epoch, logs))
  tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=scalars_logdir)
  
  print("Train model...")
  print("RMSE (on training data):")
  history = model.fit(features,
            labels,
            epochs=epochs,
            steps_per_epoch=steps_per_epoch,
            batch_size=batch_size,
            callbacks=[model_callback, tensorboard_callback],
            verbose=0).history
  print("Model training finished.")

  calibration_data = pd.DataFrame()
  calibration_data["predictions"] = model.predict_on_batch(features).flatten()
  calibration_data["targets"] = pd.Series(labels)
  display(calibration_data.describe())

## Task 1:  Achieve an RMSE of 180 or Below

Tweak the model hyperparameters to improve loss and better match the target distribution.
If, after 5 minutes or so, you're having trouble beating a RMSE of 180, check the solution for a possible combination.

In [0]:
fit_model(learning_rate=0.000001,
          steps_per_epoch=100,
          batch_size=1,
          input_feature="total_rooms")

Run TensorBoard to show Learned Line and RMSE graphs.

In [0]:
%tensorboard --logdir logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard

### Solution

Click below for one possible solution.

In [0]:
fit_model(learning_rate=0.00002,
          steps_per_epoch=500,
          batch_size=5,
          input_feature="total_rooms")

This is just one possible configuration; there may be other combinations of settings that also give good results. Note that in general, this exercise isn't about finding the *one best* setting, but to help build your intutions about how tweaking the model configuration affects prediction quality.

### Is There a Standard Heuristic for Model Tuning?

This is a commonly asked question. The short answer is that the effects of different hyperparameters are data dependent. So there are no hard-and-fast rules; you'll need to test on your data.

That said, here are a few rules of thumb that may help guide you:

 * Training error should steadily decrease, steeply at first, and should eventually plateau as training converges.
 * If the training has not converged, try running it for longer.
 * If the training error decreases too slowly, increasing the learning rate may help it decrease faster.
   * But sometimes the exact opposite may happen if the learning rate is too high.
 * If the training error varies wildly, try decreasing the learning rate.
   * Lower learning rate plus larger number of steps or larger batch size is often a good combination.
 * Very small batch sizes can also cause instability.  First try larger values like 100 or 1000, and decrease until you see degradation.

Again, never go strictly by these rules of thumb, because the effects are data dependent.  Always experiment and verify.

## Task 2: Try a Different Feature

See if you can do any better by replacing the `total_rooms` feature with the `population` feature.

Don't take more than 5 minutes on this portion.

In [0]:
!rm -rf logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard

In [0]:
fit_model(learning_rate=0.00001,
          steps_per_epoch=500,
          batch_size=10,
          input_feature="population")

In [0]:
%tensorboard --logdir logs/first_steps_with_tensor_flow_with_tf2_and_keras_plus_tensorboard

### Solution

Click below for one possible solution.

In [0]:
fit_model(learning_rate=0.00002,
          steps_per_epoch=1000,
          batch_size=5,
          input_feature="population")