# Linear Regression using a TensorFlow Estimator

This notebook shows how we can train a linear regression model using the `tf.estimator` interface.

**Warning**: If you run this notebook twice (even with a kernel restart
and the same random seed),
you will get different results because estimators apparently
automatically train beginning from the last saved state in their
`model_dir`. If you want to get a clean start, you can delete
that directory (called `lin_reg_est` in this notebook).

## Imports

In [1]:
import logging
import pickle

import tensorflow as tf

## Load in the data

In [2]:
with open('lin_reg_data.pkl', 'rb') as f:
    X_train, X_test, y_train, y_test = pickle.load(f)

## Set up the estimator

In [3]:
n_pts, n_features = X_train.shape
n_pts, n_features

(90, 2)

In [4]:
feature_columns = [tf.feature_column.numeric_column('x', shape=n_features)]

In [5]:
lin_reg_model = tf.estimator.LinearRegressor(
    feature_columns = feature_columns,
    model_dir = 'lin_reg_est'
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'lin_reg_est', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x118622898>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


## Train the estimator

In [6]:
# Training produces a lot of INFO log messages.
# We'll throw out some of those with a log filter function.
def keep_every_nth_info(n):
    i = -1
    def filter_record(record):
        nonlocal i
        i += 1
        return int(record.levelname != 'INFO' or i % n == 0)
    return filter_record
logging.getLogger('tensorflow').addFilter(keep_every_nth_info(5))

In [7]:
lin_reg_model.train(
    input_fn = lambda: ({'x': tf.convert_to_tensor(X_train)}, tf.convert_to_tensor(y_train)),
    steps = 5000
)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:global_step/sec: 2094.29
INFO:tensorflow:loss = 124.36266, step = 401 (0.044 sec)
INFO:tensorflow:global_step/sec: 2289.9
INFO:tensorflow:loss = 92.20567, step = 901 (0.044 sec)
INFO:tensorflow:global_step/sec: 2411.79
INFO:tensorflow:loss = 77.32136, step = 1401 (0.044 sec)
INFO:tensorflow:global_step/sec: 2267.31
INFO:tensorflow:loss = 70.15773, step = 1901 (0.044 sec)
INFO:tensorflow:global_step/sec: 2206.56
INFO:tensorflow:loss = 66.68256, step = 2401 (0.046 sec)
INFO:tensorflow:global_step/sec: 2083.03
INFO:tensorflow:loss = 64.99036, step = 2901 (0.044 sec)
INFO:tensorflow:global_step/sec: 2315.03
INFO:tensorflow:loss = 64.16485, step = 3401 (0.044 sec)
INFO:tensorflow:global_step/sec: 2204.63
INFO:tensorflow:loss = 63.76175, step = 3901 (0.043 sec)
INFO:tensorflow:global_step/sec: 2200.75
INFO:tensorflow:loss = 63.564877, step = 4401 (0.044 sec)
INFO:tensorflow:global_step/sec: 2244.46


<tensorflow.python.estimator.canned.linear.LinearRegressor at 0x118622208>

Note that the loss here is the sum squared error,
rather than the mean squared error (MSE). The next
cell helps us find the MSE loss value.

In [8]:
metrics = lin_reg_model.evaluate(
    input_fn = lambda: ({'x': tf.convert_to_tensor(X_train)}, tf.convert_to_tensor(y_train)),
    steps = 1
)
metrics['average_loss']

INFO:tensorflow:Starting evaluation at 2018-08-07-19:42:24
INFO:tensorflow:Evaluation [1/1]


0.70507103

In [9]:
# Check the learned weights and bias.
print(lin_reg_model.get_variable_value('linear/linear_model/x/weights'))
print(lin_reg_model.get_variable_value('linear/linear_model/bias_weights'))

[[3.4154036]
 [4.4533696]]
[0.7958069]


As far as I could determine, there is no easy way to add regularization
to the pre-made `LinearRegressor` model.