<a href="https://colab.research.google.com/github/vijayko/Computer-Science-II-lab-/blob/master/Lab_2_Training_Your_First_TF_Linear_Regression_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2018 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Lab 2: Training Your First TF Linear Regression Model
** Learning Objectives **
* Use pyplot to help visualize the data, the learned model, and how the loss is evolving during training.
* Learn how to set up the features in TensorFlow to train a model.
* Use the LinearRegressor class in TensorFlow to predict a real-valued featured based on one real-valued input feature.
* Evaluate the accuracy of a model's predictions using Root Mean Squared Error (RMSE) and understand the learning curve.
* Improve the accuracy of a model by tuning the learning rate and number of training steps.

### Imports and Pandas Options
We import the libraries we are using and set some panda options.

In [0]:
import fnmatch
import math

from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

# This line increasing the amount of logging when there is an error.  You can
# remove it if you want less logging
tf.logging.set_verbosity(tf.logging.ERROR)

# Set the output display to have two digits for decimal places, for display
# readability only and limit it to printing 15 rows.
pd.options.display.float_format = '{:.2f}'.format
pd.options.display.max_rows = 15

### Data Set
As in the last lab we use the [Automobile Data Set](https://archive.ics.uci.edu/ml/datasets/automobile)  from 1985 Ward's Automotive Yearbook that is part of the  [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets) 

### Loading and Randomizing the Data
Load the data using the column names from [Automobile Data Set](https://archive.ics.uci.edu/ml/datasets/automobile). When using SGD (stochastic graident descent) for training it is important that **each batch is a random sample of the data** so that the gradient computed is representative.  While there appears to be no order to this data set, it is always good practice to shuffle the data to be in a random order.


In [0]:
# Provide the names for the columns since the CSV file with the data does
# not have a header row.
cols = ['symboling', 'losses', 'make', 'fuel-type', 'aspiration', 'num-doors',
        'body-style', 'drive-wheels', 'engine-location', 'wheel-base',
        'length', 'width', 'height', 'weight', 'engine-type', 'num-cylinders',
        'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-ratio',
        'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'price']


# Load in the data from a CSV file that is comma seperated.
car_data = pd.read_csv('https://storage.googleapis.com/ml_universities/cars_dataset/cars_data.csv',
                        sep=',', names=cols, header=None, encoding='latin-1')

# We'll then randomize the data, just to be sure not to get any pathological
# ordering effects that might harm the performance of Stochastic Gradient
# Descent.
car_data = car_data.reindex(np.random.permutation(car_data.index))

###Visualizing a Linear Model Using a Scatter Plot

When training a linear regression model over a single variable, a really nice thing to be able to do is to show the model (which is just a line) as part of the scatter plot. That really helps you see how well the model fits the data. Just looking at the loss (RMSE here) doesn't really indicate how good the model is. Sometimes you want to show several models on the same scatter plot to compare them so we allow slopes, biases, and model_names to all be lists. They should be of the same size giving the weight (slope), bias, and name (to use in the legend) for the model.

In [0]:
def scatter_plot(features, targets, slopes=[], biases=[], model_names=[]):
  """ Creates a scatter plot of input_feature vs target along with the models.
  
  Args:
    features: list of the input features
    targets: list of targets
    slopes: list of model weight (slope) 
    bias: list of model bias (same size as slopes)
    model_names: list of model_names to use for legend (same size as slopes)
  """      
  # Define some colors to use that go from blue towards red
  colors = [cm.coolwarm(x) for x in np.linspace(0, 1, len(slopes))]
  
  # Generate the Scatter plot
  plt.ylabel("target")
  plt.xlabel("input feature")
  plt.scatter(features, targets, color='black', label="")
  
  # Add the lines corresponding to the provided models
  for i in range (0, len(slopes)):
    y_0 = slopes[i] * features.min() + biases[i]
    y_1 = slopes[i] * features.max() + biases[i]
    plt.plot([features.min(), features.max()], [y_0, y_1],
             label=model_names[i], color=colors[i])
  if (len(model_names) > 0):
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

### Converting Missing Numerical Values to the Column Mean

As you hopefully found in a previous exercise, a good option for replacing missing entries (NaN) is to replace them by the column mean.  We do that here.

In [0]:
car_data['price'] = pd.to_numeric(car_data['price'], errors='coerce')
car_data['horsepower'] = pd.to_numeric(car_data['horsepower'], errors='coerce')
car_data['peak-rpm'] = pd.to_numeric(car_data['peak-rpm'], errors='coerce')
car_data['city-mpg'] = pd.to_numeric(car_data['city-mpg'], errors='coerce')
car_data['highway-mpg'] = pd.to_numeric(car_data['highway-mpg'], errors='coerce')

# Replace nan by the mean storing the solution in the same table (`inplace')
car_data.fillna(car_data.mean(), inplace=True)

### Build Your First Tensor Flow Model

We now build a model to predict `price`, which will be our label (sometimes also called a target) using `horsepower` as our input feature. To train our model, we'll use the [LinearRegressor](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearRegressor) interface provided by the TensorFlow [Estimator](https://www.tensorflow.org/get_started/estimator) API. This API takes care of a lot of the low-level model plumbing, and exposes convenient methods for performing model training, evaluation, and inference.

###Prepare Features
As our learning models get more sophisticated we will want to do some computation on the features and even generate new features from the existing ones. We will see examples of this in later labs.  For now this method will just make a copy of a portion of the dataframe.

In [0]:
def prepare_features(dataframe):
  """Prepares the features for provided dataset.

  Args:
    dataframe: A Pandas DataFrame expected to contain data from the
      desired data set.
  Returns:
    A new dataFrame that contains the features to be used for the model.
  """
  processed_features = dataframe.copy()
  return processed_features

###Generate the Training Examples
We simply call `prepare_features` on the `car_data` dataframe.

In [0]:
training_examples = prepare_features(car_data)

###Setting Up the Feature Columns for TensorFlow

In order to import our training data into TensorFlow, we need to specify what type of data each feature contains. There are two main types of data we'll use in this and future exercises:

* **Numerical Data**: Data that is a number (integer or float) and that you want to treat as a number. As we will discuss more later, sometimes you might want to treat numerical data (e.g., a postal code) as if it were categorical.

* **Categorical Data**: Data that is textual such as `make` or `fuel-type`.

In TensorFlow, we indicate a feature's data type using a construct called a **feature column**. Feature columns store only a description of the feature data; they do not contain the feature data itself.

For now, we will just use numerical features.  Later you will learn how to use categorical data.

In [0]:
NUMERICAL_FEATURES = ["horsepower"]

def construct_feature_columns():
  """Construct the TensorFlow Feature Columns.

  Returns:
    A set of feature columns
  """ 
  return set([tf.feature_column.numeric_column(feature)
              for feature in NUMERICAL_FEATURES])

### Input Function
To import our data into a LinearRegressor, we need to define an input function, which instructs TensorFlow how to preprocess the data, as well as how to batch, shuffle, and repeat it during model training.

First, we'll convert our Pandas feature data into a dictionary of NumPy arrays. We can then use the TensorFlow Dataset API to construct a dataset object from our data, and then break our data into batches of batch_size, to be repeated for the specified number of epochs (num_epochs).

When the default value of num_epochs=None is passed to [Dataset.repeat()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#repeat), the input data will be repeated indefinitely.

Next, if shuffle is set to True, we'll shuffle the data so that it's passed to the model randomly during training. We'll shuffle the data using [Dataset.shuffle()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle), which receives a parameter buffer_size that specifies the size of the dataset from which shuffle will randomly sample.

Finally, our input function constructs an iterator for the dataset and returns the next batch of data to the LinearRegressor.

**NOTE:** We'll continue to use this same input function in later exercises. For more
detailed documentation of input functions and the `Dataset` API, see the [TensorFlow Programmer's Guide](https://www.tensorflow.org/programmers_guide/datasets).

In [0]:
def input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
    """Defines a function to preprocess the data, as well as how to batch,
      shuffle, and repeat it during model training..
  
    Args:
      features: pandas DataFrame of features
      targets: pandas DataFrame of targets
      batch_size: Size of batches to be passed to the model
      shuffle: True or False. Whether to shuffle the data.
      num_epochs: Number of epochs for which data should be repeated. None = repeat indefinitely
    Returns:
      Tuple of (features, labels) for next data batch
    """
    
    # Convert pandas data into a dict of np arrays.
    features = {key:np.array(value) for key,value in dict(features).items()}                                           
 
    # Construct a dataset, and configure batching/repeating.
    ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
    ds = ds.repeat(num_epochs)
    if shuffle:
      ds = ds.shuffle(10000)
    ds = ds.batch(batch_size)
    
    # Return the next batch of data.
    features, labels = ds.make_one_shot_iterator().get_next()
    return features, labels

###Configure the LinearRegressor

Next, we'll configure a linear regression model using LinearRegressor. We'll train this model using the `GradientDescentOptimizer`, which implements Mini-Batch Stochastic Gradient Descent (SGD). The `learning_rate` argument controls the size of the gradient step.

**NOTE:** To be safe, we also apply [gradient clipping](https://developers.google.com/machine-learning/glossary/#gradient_clipping) to our optimizer via `clip_gradients_by_norm`. Gradient clipping ensures the magnitude of the gradients do not become too large during training, which can cause gradient descent to fail. 

In [0]:
def define_linear_regression_model(learning_rate):
  """ Defines a linear regression model of one feature to predict the target.
  
  Args:
    learning_rate: A `float`, the learning rate
    
  Returns:
    A linear regressor created with the given learning rate
  """
  
  optimizer=tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
  optimizer = tf.contrib.estimator.clip_gradients_by_norm(optimizer, 5.0)
  linear_regressor = tf.estimator.LinearRegressor(
    feature_columns=construct_feature_columns(),
    optimizer=optimizer
  )  
  return linear_regressor

###Train the Model

We now have all the pieces we need to train a model.

In [0]:
NUMERICAL_FEATURES = ["horsepower"]
CATEGORICAL_FEATURES = []
LABEL = "price"

# Create regression model using the define_regression_model procedure that we
# defined earlier.
linear_regressor = define_linear_regression_model(learning_rate = 1)

train_input_fn = lambda: input_fn(training_examples[NUMERICAL_FEATURES], 
                                  training_examples[LABEL], 
                                  batch_size=50)

# Train the predictor using 100 steps through the data.
_ = linear_regressor.train(
      input_fn=train_input_fn, steps=100
)

### Evaluate the Model

Let's make predictions on that training data, to see how well our model fit it during training.

**NOTE:** Training error measures how well your model fits the training data, but it **_does not_** measure how well your model **_generalizes to new data_**. In later exercises, you'll explore how to split your data to evaluate your model's ability to generalize.


In [0]:
features = training_examples[NUMERICAL_FEATURES]
targets = training_examples[LABEL]
training_predict_input_fn = lambda: input_fn(
    features, targets, num_epochs=1, shuffle=False)

# Call predict() on the linear_regressor to make predictions.
predictions = linear_regressor.predict(input_fn=training_predict_input_fn)

# Format predictions as a NumPy array, so we can calculate error metrics.
predictions = np.array([item['predictions'][0] for item in predictions])

# Print Mean Squared Error and Root Mean Squared Error.
mean_squared_error = metrics.mean_squared_error(predictions, targets)
root_mean_squared_error = math.sqrt(mean_squared_error)
print("Mean Squared Error (on training data): %0.3f" % mean_squared_error)
print("Root Mean Squared Error (on training data): %0.3f" % root_mean_squared_error)

### Looking at the Feature Weight (Slope) and Bias of Our Trained Model

TensorFlow provides an easy way to view the weights of the trained model. Although we just have a single feature right now, this code block shows how you could access and print all of the feature weights for a linear model.

In [0]:
w = linear_regressor.get_variable_value('linear/linear_model/horsepower/weights')[0]
b = linear_regressor.get_variable_value('linear/linear_model/bias_weights')[0]
scatter_plot(features.values, targets,[w], [b], ['trained model'])

### Computing the Loss
For now we are using root mean squared error (RMSE) for our loss since that is the appropriate loss to use for linear regression.  However, to keep the procedure to train the model very generic, we introduce a method to compute loss that can be tailored to other types of problems. For this lab, our implementation will be to return the RMSE.


In [0]:
def compute_loss(predictions, targets):
  """ Computes the loss (RMSE) for linear regression.
  
  Args:
    predictions: a list of values predicted by the model being visualized
    targets: a list of the target values being predicted that must be the
             same size as predictions.
    
  Returns:
    The RMSE for the provided predictions and targets
  """      
  return math.sqrt(metrics.mean_squared_error(predictions, targets))

###Learning Curve

Another important tool is a graph often called a **learning curve** that shows the loss being minimized on the y-axis and the training steps (time) on the x-axis.

In [0]:
def plot_learning_curve(training_losses):
  """ Plot the learning curve
  
  Args:
    training_loses: a list of losses to plot
  """        
  plt.ylabel('Loss')
  plt.xlabel('Training Steps')
  plt.plot(training_losses)

###Training Our Model with a Learning Curve and Scatter Plot

We now have all the pieces we need to train a model in a way that we can tune the learning rate and  visually evaluate how the model is performing.  In order to generate intermediate losses for the learning curve (and record as we are training), we divide the training into 10 periods.  After each period we compute the loss.  We also store the weight and bias of the model at that time so that we can then visually show how the model evolves in a scatter plot.  You are welcome to modify the number of periods but 10 seems to work out pretty well. 

We start by using a *scatter plot* as our visualization for understanding how the model evolves when training but this can only easily be done with a single numerical feature.  Later we will switch to a different visualization that can be used for any linear model.

In [0]:
# Function to train the model when there is a single numeric feature, which
# allows using the scatter plot as a visualization with the progession of the
# model.
def train_model_with_one_numerical_feature(linear_regressor, features, labels,
                                           steps, batch_size):
  """Trains a linear regression model.
  
  Args:
    linear_regressor: The regressor to train
    features: The list of input feature values
    labels: The label values
    steps: A non-zero `int`, the total number of training steps.
    batch_size: A non-zero `int`, the batch size.
    
  Returns:
    The trained regressor
  """
  # In order to see how the model evolves as we train it, we will divide the
  # steps into periods and show the model after each period.
  periods = 10
  steps_per_period = steps / periods
  
  # Set up the training_input_fn and predict_training_input_fn
  training_input_fn = lambda: input_fn(features, labels, batch_size=batch_size)
  predict_training_input_fn = lambda: input_fn(features, labels, num_epochs=1,
                                               shuffle=False)
  
  # Train the model, but do so inside a loop so that we can periodically assess
  # loss metrics.  We store the loss, slope (feature weight), bias, and a name
  # for the model when there is a single feature (which would then allows us
  # to plot the model in a scatter plot).
  print("Training model...")
  training_losses = []
  slopes = []
  biases = []
  model_names = []

  for period in range (0, periods):
    # Call fit to train the regressor for steps_per_period steps
    _ = linear_regressor.train(input_fn=training_input_fn, steps=steps_per_period)

    # Use the predict method to compute the predictions from the current model
    predictions = linear_regressor.predict(input_fn=predict_training_input_fn)
    predictions = np.array([item['predictions'][0] for item in predictions])
   
    # Compute the loss between the predictions and the correct labels, append
    # the loss to the list of losses used to generate the learning curve after
    # training is complete and print the current loss
    loss = compute_loss(predictions, labels)
    training_losses.append(loss) 
    print("  Loss after period %02d : %0.3f" % (period, loss))
    
    # Add slope, bias and model_name to the lists to be used later to plot
    # the model after each training period.
    feature_weight = fnmatch.filter(linear_regressor.get_variable_names(),
                                    'linear/linear_model/*/weights')
    slopes.append(linear_regressor.get_variable_value(feature_weight[0])[0])
    biases.append(linear_regressor.get_variable_value(
        'linear/linear_model/bias_weights')[0])
    model_names.append("period_" + str(period))
      
  # Now that training is done print the final loss    
  print("Final Loss (RMSE) on the training data: %0.3f" % loss) 
  
  # Generate a figure with the learning curve on the left and a scatter plot
  # on the right
  plt.figure(figsize=(10, 5))
  plt.subplot(1, 2, 1)
  plt.title("Learning Curve (RMSE vs time)")
  plot_learning_curve(training_losses)
  plt.subplot(1, 2, 2)
  plt.tight_layout(pad=1.1, w_pad=3.0, h_pad=3.0)
 
  plt.title("Learned Model by Period on Scatter Plot")
  scatter_plot(features.values, labels, slopes, biases, model_names)   
  return linear_regressor

### Example Learning Curve When the Learning Rate is Too High

When the learning rate is too high you will see the loss going up and down indicating you are making adjustments that are too large.  When you see this happening lower the learning rate (initially by a factor of 10 and then make smaller adjustments when you are close).  In this case you are moving back and forth between having the slope too large and too small.

In [0]:
NUMERICAL_FEATURES = ["horsepower"]
CATEGORICAL_FEATURES = []
LABEL = "price"

LEARNING_RATE = 100
STEPS = 50

linear_regressor = define_linear_regression_model(learning_rate = LEARNING_RATE)
linear_regressor = train_model_with_one_numerical_feature(
    linear_regressor, training_examples[NUMERICAL_FEATURES],
    training_examples[LABEL], batch_size=50, steps=STEPS)

### Example Learning Curve When the Learning Rate is Too Low

When the learning rate is too low then the changes are too small.  While this might eventually get you to a good solution it would take way more steps than needed and the training time is roughly proportinal to the number of steps so you want to find a learning rate that gets you to a good solution as fast as you can.  You can see for these settings that the model learned (the line you see in the scatter plot) is improving and would eventually get there but is taking much, much longer than needed to train.

In [0]:
NUMERICAL_FEATURES = ["horsepower"]
CATEGORICAL_FEATURES = []
LABEL = "price"

LEARNING_RATE = 0.001
STEPS = 10000

linear_regressor = define_linear_regression_model(learning_rate = LEARNING_RATE)
linear_regressor = train_model_with_one_numerical_feature(
    linear_regressor, training_examples[NUMERICAL_FEATURES],
    training_examples[LABEL], batch_size=50, steps=STEPS)

##Exercise: Modify the Hyperparmaters to Get a Better Model (1 Point)
For this task, you can use the code block below that puts all the above code in a single cell for convenience. Focus on first finding a good learning rate and then adjusting the number of steps to be what you need to converge.


In [0]:
NUMERICAL_FEATURES = ["horsepower"]
CATEGORICAL_FEATURES = []
LABEL = "price"

## Fill in the rest of your solution here.  Feel free to introduce multiple
## code boxes if you want to see the solutions and learning curves from
## different options at the same time

In [0]:
"""
List at least 3 of the sets of hyperparameters you tried and the RMSE obtained.
Your primary goal is to get the lowest RMSE you can.  Once you've done that a
secondary goal is to minmize the number of steps used since the computation
cost depends heavily on the number of steps.

Please Submit this with the results from the hyperparameters that you feel
worked best.

TYPE YOUR ANSWER IN THIS COMMENT
"""

##Exercise: Try a Different Input Feature (4 Points)

The choice of the hyperparameters depends a lot on the data set and what you are trying to learn.  In this task you will try to predict the price from highway mpg and find a good set of hyperparameters for this problem.

* Use highway-mpg instead of horsepower to predict price. You might want to start by just plotting the data.  What do you observe?
* What hyperparameters give you the best trained model that you can get.  Try to keep the learning steps as small as you can while still training a good model.
* Did you have to change the hyperparameters a lot?  If you did, why do you think that might be the case?
* How does the RMSE for your model compare to the optimal RMSE?  Think about what you'll need to do in order to answer this question.

In [0]:
NUMERICAL_FEATURES = ["highway-mpg"]
CATEGORICAL_FEATURES = []
LABEL = "price"

## Fill in the rest of your solution here.

In [0]:
"""
TYPE YOUR ANSWERS TO THE GIVEN QUESTIONS HERE
"""