How to show the loss curve of training set and validation set at the same time using the customed estimator? #18858

zjy8006 · 2018-04-25T09:50:48Z

Hi, recently I used custom_estimator.py to build regression model. In order to clear out the changes of loss value in the training set and validation set. I need to know that how to show the loss curve of training and validation set at the same time. I tried to use train_and_evaluate api of estimator and i got the following picture.

As it show, the result of evaluation is a point, but i want a line like the loss curve of training set. Just like the picture as shown below.

Here is my system information:

Have i written custom code: N/A
OS: Tested on windows 10 1709.
Tensorflow installed from Anaconda 5.1.0 with python 3.6.4
Tensorflow version-tested on tensorflow-gpu 1.7.0
CUDA/cuDNN version: 9.0 for TF 1.7
GPU mode: Nvidia Quadro K2100M， 2G of memory
Bazel version: N/A
Exact command to reproduce: N/A
Here is the customed estimator:

def my_dnn_regression_fn(features, labels, mode, params):
    top = tf.feature_column.input_layer(features, params['feature_columns'])

    for units in params.get('hidden_units', [20]):
        top = tf.layers.dense(inputs=top, units=units, activation=tf.nn.relu)

    output_layer = tf.layers.dense(inputs=top, units=1)

    output_layer = tf.cast(output_layer, tf.float64)
   
    predictions = tf.squeeze(output_layer, 1)

    if mode == tf.estimator.ModeKeys.PREDICT:
        # In 'PREDICT' mode we only need to return predictions.
        return tf.estimator.EstimatorSpec(
            mode=mode, predictions={"predictions": predictions})

    # calculate the loss using mean squared error
    average_loss = tf.losses.mean_squared_error(labels, predictions)

    # Pre-made estimators use the total_loss instead of the average,
    # so report total_loss for compatibility.
    batch_size = tf.shape(labels)[0]
    total_loss = tf.to_float(batch_size) * average_loss

    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = params.get("optimizer", tf.train.AdamOptimizer)
        optimizer = optimizer(params.get("learning_rate", None))
        train_op = optimizer.minimize(
            loss=average_loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(
            mode=mode, loss=total_loss, train_op=train_op)

    # In the evaluation mode we will calculate evaluation metrics.
    assert mode == tf.estimator.ModeKeys.EVAL

    # Calculate root mean squared error
    rmse = tf.metrics.root_mean_squared_error(labels, predictions)

    # Add the rmse to collection of evaluation metrics.
    eval_metrics = {"rmse": rmse}

    return tf.estimator.EstimatorSpec(
        mode=mode,
        # Report sum of error for compatibility with pre-made estimators.
        loss=total_loss,
        eval_metric_ops=eval_metrics)

And here I used train_and_evaluate api like this:

    model = tf.estimator.Estimator(
        model_fn=my_dnn_regression_fn,
        model_dir=
        "./models/temp",
        params={
            'feature_columns': feature_columns,
            'learning_rate': 0.1,
            'optimizer': tf.train.AdamOptimizer,
            'hidden_units': [20, 20, 20, 20]
        })
    train_spec = tf.estimator.TrainSpec(input_fn=input_train,max_steps=10000)
    eval_spec = tf.estimator.EvalSpec(input_fn=input_dev,steps=10000,throttle_secs=60,start_delay_secs=0)
    tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

Did I set the parameter properly? Or, is there other solution for this?

The text was updated successfully, but these errors were encountered:

tensorflowbutler · 2018-04-25T18:56:16Z

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

whyboris · 2018-06-13T13:38:47Z

I suspect this is a possible answer:
https://stackoverflow.com/questions/40146428/show-training-and-validation-accuracy-in-tensorflow-using-same-graph
But it would be great if we could do this without having to custom-write code.
I think it's particularly informative to see how the validation set is performing compared to the training set.
Could this be a simple toggle/view/filter (unsure what to call it) in TensorBoard?

ps - I think stat:awaiting response label can be ignored -- that information is irrelevant for this issue.

bhack · 2018-06-28T12:44:20Z

Is it just a documentation lack label?

whyboris · 2018-07-28T19:04:12Z

It is really helpful to see loss & accuracy right next to each other. I think it would be a great feature to have as a default setting. And it really is important to see the loss for training and validation together -- to see if they begin to diverge.

A rough proposal (not styled for TensorBoard, but still):

cy89 · 2018-08-11T00:17:25Z

This seems like a feature request; @dsmilkov is this your territory?

dsmilkov · 2018-08-27T19:04:13Z

I didn't work on the charts in TensorBoard, but @jart would be able to help/delegate here.

whyboris · 2018-09-11T20:26:22Z

Seems like a good feature to have. Unsure if the above @tensorflowbutler message means the issue is going to get auto-closed or it means the issue will now get more attention. Either way -- saying 'seems like a good feature to have' 😉

cy89 · 2018-09-11T20:41:16Z

@jart, gentle ping: could you please advise or delegate?

lintingxue · 2018-10-23T16:25:55Z

I am also looking for this feature, it would be great to have it.

shkarupa-alex · 2018-12-27T05:46:27Z

+1 to have this feature out-of-the-box

ispirmustafa · 2019-02-05T01:29:38Z

Evaluation runs on checkpoints. May be the reason you see only one evaluation step is there is only one checkpoint. Could you please play with tf.estimator.RunConfig(save_checkpoints_steps=SOME_SMALL_VALUE_TO_VERIFY)

shkarupa-alex · 2019-02-05T06:03:32Z

I think that issue is deeper inside Estimator architecture.
In my case i see all required validation metrics, but non of them for training phase.

This is because EstimatorSpec returned in training stage did not contain eval_metric_ops (look at any estimator _Head).
Estimator's internal methods that use EstimatorSpec in train phase (as i think) don't use at eval_metric_ops too.

If we look at custom estimator guide accuracy will be shown in TensorBoard only if we use custom model_fn and log it ourselves with tf.summary.scalar('accuracy', accuracy[1])

ispirmustafa · 2019-02-25T17:53:16Z

Hi @zjy8006 I'm closing this issue since I think checkpointing is the main reason you couldn't see more evaluation points.

HappyBahman · 2019-04-14T10:57:46Z

Hi @ispirmustafa , in my experience setting the checkpoints with tf.estimator.RunConfig(save_checkpoints_steps=SOME_SMALL_VALUE_TO_VERIFY) does not work either. I tried this with 1, 10, 1000 and 10000 (which was the total number of my steps), all leading to somewhat same results. Although this makes the number of checkpoints vary, the number of points in eval plot are still 2 at max. (the below image shows my tensorboard plot after setting steps to 1)

leimao · 2019-08-13T22:41:55Z

So what is the final solution to this? Has TensorFlow added this feature or fixed this "bug"?

shkarupa-alex · 2019-08-14T07:46:04Z

No, there is no easy solution.
EstimatorSpec for training does not include metrics.

First way you can go - estimate and write metrics manually from custom model_fn.

Second way i made for myself - estimator wrapper.
Here it is https://github.com/shkarupa-alex/tfmiss/blob/master/tfmiss/estimator/extenders.py (since my package requires to be built with bazel you may copy this particular file)
Based on https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/estimator/add_metrics

sumuzhao · 2019-11-23T09:32:48Z

I think the correct way is to use hooks or listeners. But this is non-trivial.

cowwoc · 2021-07-18T02:14:57Z

Can someone please reopen this issue since it was never really resolved?

tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Apr 25, 2018

tensorflowbutler assigned cy89 Apr 25, 2018

cy89 added the type:feature Feature requests label Aug 11, 2018

cy89 assigned dsmilkov and unassigned cy89 Aug 11, 2018

dsmilkov assigned jart and unassigned dsmilkov Aug 27, 2018

cy89 added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Sep 11, 2018

tensorflowbutler unassigned jart Dec 7, 2018

jvishnuvardhan assigned jvishnuvardhan and mdanatg and unassigned jvishnuvardhan Jan 9, 2019

jvishnuvardhan added the graph_editor label Jan 9, 2019

mdanatg removed their assignment Jan 25, 2019

mdanatg assigned ispirmustafa Feb 5, 2019

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Feb 5, 2019

ispirmustafa closed this as completed Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to show the loss curve of training set and validation set at the same time using the customed estimator? #18858

How to show the loss curve of training set and validation set at the same time using the customed estimator? #18858

zjy8006 commented Apr 25, 2018 •

edited

tensorflowbutler commented Apr 25, 2018

whyboris commented Jun 13, 2018 •

edited

bhack commented Jun 28, 2018

whyboris commented Jul 28, 2018 •

edited

cy89 commented Aug 11, 2018

dsmilkov commented Aug 27, 2018

whyboris commented Sep 11, 2018

cy89 commented Sep 11, 2018

lintingxue commented Oct 23, 2018

shkarupa-alex commented Dec 27, 2018

ispirmustafa commented Feb 5, 2019

shkarupa-alex commented Feb 5, 2019

ispirmustafa commented Feb 25, 2019

HappyBahman commented Apr 14, 2019 •

edited

leimao commented Aug 13, 2019

shkarupa-alex commented Aug 14, 2019

sumuzhao commented Nov 23, 2019

cowwoc commented Jul 18, 2021 •

edited

How to show the loss curve of training set and validation set at the same time using the customed estimator? #18858

How to show the loss curve of training set and validation set at the same time using the customed estimator? #18858

Comments

zjy8006 commented Apr 25, 2018 • edited

tensorflowbutler commented Apr 25, 2018

whyboris commented Jun 13, 2018 • edited

bhack commented Jun 28, 2018

whyboris commented Jul 28, 2018 • edited

cy89 commented Aug 11, 2018

dsmilkov commented Aug 27, 2018

whyboris commented Sep 11, 2018

cy89 commented Sep 11, 2018

lintingxue commented Oct 23, 2018

shkarupa-alex commented Dec 27, 2018

ispirmustafa commented Feb 5, 2019

shkarupa-alex commented Feb 5, 2019

ispirmustafa commented Feb 25, 2019

HappyBahman commented Apr 14, 2019 • edited

leimao commented Aug 13, 2019

shkarupa-alex commented Aug 14, 2019

sumuzhao commented Nov 23, 2019

cowwoc commented Jul 18, 2021 • edited

zjy8006 commented Apr 25, 2018 •

edited

whyboris commented Jun 13, 2018 •

edited

whyboris commented Jul 28, 2018 •

edited

HappyBahman commented Apr 14, 2019 •

edited

cowwoc commented Jul 18, 2021 •

edited