The tutorial "Logging and Monitoring Basics with tf.contrib.learn" has error. #7669

lienhua34 · 2017-02-19T15:52:17Z

When I used the code snippet in the section "Customizing the Evaluation Metrics with MetricSpec" of the tutorial Logging and Monitoring Basics with tf.contrib.learn. the code snippet is

validation_metrics = {
    "accuracy":
        tf.contrib.learn.metric_spec.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_accuracy,
            prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
            CLASSES),
    "precision":
        tf.contrib.learn.metric_spec.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_precision,
            prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
            CLASSES),
    "recall":
        tf.contrib.learn.metric_spec.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_recall,
            prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
            CLASSES)
}

My tensorflow version is r1.0 . When I run my program, it print the following error:

$ python iris.py 
Traceback (most recent call last):
  File "iris.py", line 72, in <module>
    tf.app.run()
  File "/Library/Python/2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "iris.py", line 24, in main
    "accuracy": tf.contrib.learn.metric_spec.MetricSpec(
AttributeError: 'module' object has no attribute 'metric_spec'

I found that the class tf.contrib.learn.metric_spec.MetricSpec has been renamed to tf.contrib.learn.MetricSpec.

The class tf.contrib.learn.prediction_key.PredictionKey also has been renamed to tf.contrib.learn.PredictionKey.

The text was updated successfully, but these errors were encountered:

terrytangyuan · 2017-02-19T18:03:04Z

@lienhua34 yes it's correct. The interface has been sealed recently. Welcome to submit a pull request! @martinwicke Does the team have any plan to rewrite Monitor tutorial by Hooks?

martinwicke · 2017-02-21T23:29:46Z

Sanders, didn't you already do that?

sandersk · 2017-02-21T23:45:43Z

No, I did update this tutorial back in December, but haven't yet switched to use SessionRunHook, as I was waiting on an equivalent canned hook for ValidationMonitor. That's not yet available, correct?

In the meantime, for an example of applying a SessionRunHook to an Estimator, you can refer to the tf.layers tutorial (https://www.tensorflow.org/tutorials/layers), which covers how to configure a LoggingTensorHook.

martinwicke · 2017-02-22T01:07:28Z

@ispirmustafa: FYI regarding our discussion today about "SingleWorkerEvaluationHook"

alanyuchenhou · 2017-04-12T22:49:31Z

I'm also following this tutorial and having problems with it. I'm using the latest 1.0.1 release.

The tutorial at https://www.tensorflow.org/get_started/monitors has outdated code snippet which generates errors.
The tutorial code at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_monitors.py does not generates errors and generates some checkpoint files in the log_dir. But Tensorboard doesn't display any data (screenshot attached)

Is there any working example for these monitors CaptureVariable, PrintTensor, ValidationMonitor?

ghost · 2017-04-17T16:23:16Z

I have the same problem,and it does not work after changing tf.contrib.learn.metric_spec.MetricSpec/tf.contrib.learn.prediction_key.PredictionKey to tf.contrib.learn.MetricSpec/tf.contrib.learn.PredictionKey.Any one could help?

alanyuchenhou · 2017-04-18T06:16:50Z

@terrytangyuan @sandersk
Are these monitors CaptureVariable, PrintTensor, ValidationMonitor already deprecated by LoggingTensorHook?

martinwicke · 2017-04-24T06:09:14Z

Yes. All Monitors are deprecated. Not all of them have a direct equivalent, but there should be hooks for the main use cases. Except ValidationMonitor, as of today.

kickbox · 2017-05-15T14:29:29Z

I wanted to learn, TF and started off with tf.contrib.learn from https://www.tensorflow.org/get_started/tflearn. However I got stuck in the same problem with ValidationMonitors. I understand that they are depreciated now. I don't have the head to go through the new "hooks" tutorial. for visualizing through tensor board yet. Is there a simple tutorial using iris dataset as a continuation from ~/get_started/fflearn ?

xxqcheers · 2017-06-16T10:13:23Z

when I run the " iris_monitors.py"
errors as follows:

tf.contrib.learn.prediction_key.PredictionKey.CLASSES),

AttributeError: module 'tensorflow.contrib.learn' has no attribute 'prediction_key'

martinwicke · 2017-06-16T16:51:14Z

@ispirmustafa FYI

We should be fixing this as part of our tutorials rewrite for core estimators.

agniszczotka · 2017-07-31T10:53:38Z

is there any update regarding ValidationMonitor as hook? The documentation seems to not be updated

AxenGitHub · 2017-08-02T20:20:10Z

I am in the same boat as "agniszczotka".
I have successfully used a SummarySaverHook to write some stats to file and display them on tensorboard, but i am wondering how i can evaluate the accuracy improvement through training. Should i put an estimator.evaluate with different "step" parameters to evaluate the accuracy in different moments/checkpoints?
In specific, i am trying to replicate this: https://www.tensorflow.org/versions/r1.3/get_started/monitors#evaluating_every_n_steps

agniszczotka · 2017-08-08T10:32:18Z

AxenGitHub I managed to run validation through training by using experiment
see a doc here: https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment


experiment = Experiment(estimator=estimator, train_input_fn=training_input_fn,
                               eval_input_fn=eval_input_fn, eval_steps=None, min_eval_frequency=1)
experiment.train_and_evaluate()

I am not sure how effective is it yet, but it did a job.
could you please share your solution for implementing validation monitor with the hooks. I made a question at stackoverflow https://stackoverflow.com/questions/45417502/validation-during-training-of-estimator?noredirect=1#comment77798445_45417502

maximedb · 2017-08-29T20:37:11Z

@agniszczotka Thanks for your help. When I implement your suggestion, I get the following error:
File ".../anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 253, in train if (config.environment != run_config.Environment.LOCAL and
AttributeError: 'RunConfig' object has no attribute 'environment'
Any idea on how to get around it?

terrytangyuan · 2017-08-29T22:48:33Z

@maximedb I fixed that in #11385. It would be great if someone could take a look at that one.

maximedb · 2017-09-08T06:58:53Z

It was resolved by adding the following lines (see here)

os.environ['TF_CONFIG'] = json.dumps({'environment': 'local'})
config = tf.contrib.learn.RunConfig()
estimator = tf.estimator.Estimator(..., config = config)

chenfei-wu · 2017-09-09T10:16:58Z

How can I use early_stopping in environment?

lelugom · 2017-10-21T17:50:54Z

@Moymix you can implement early stopping by using the continuous_eval_predicate_fn, available in tf.contrib.learn.Experiment.continuous_eval_on_train_data. For instance, let's take a batch size of 10 and early stop count of 15. Modifying the example at TF Layers tutorial for a bigger dataset, the code would look like this:

BATCH_SIZE  = 10
EARLY_STOP_COUNT = 15

# Model function
def model_fn(features, labels, mode):
  # ...
  eval_metric_ops = { "accuracy"  : accuracy}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

# Early stopping function
accuracy_reg = np.zeros(EARLY_STOP_COUNT)
def early_stopping(eval_results):
  # None argument for the first evaluation
  if not eval_results: 
    return True
  
  accuracy_reg[0 : EARLY_STOP_COUNT - 1] = accuracy_reg[1 : EARLY_STOP_COUNT]
  accuracy_reg[EARLY_STOP_COUNT - 1] = eval_results["accuracy"]
  counts = 0
  for i in range(0, EARLY_STOP_COUNT - 1):
    if accuracy_reg[i + 1] <= accuracy_reg[i]:
      counts += 1
  if counts == EARLY_STOP_COUNT - 1:
    print("\nEarly stopping: %s \n" % accuracy_reg)
    return False
    
  return True

# Main function
def main(unused_argv):
  #...
  estimator = tf.estimator.Estimator(
  #...
  # Train the model 
  train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"data": train_data},
    y=train_labels,
    batch_size=BATCH_SIZE,
    num_epochs=None, # Continue until training steps are finished
    shuffle=True
    )
  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"data": validate_data},
    y=validate_labels,
    batch_size=BATCH_SIZE,
    num_epochs=1, 
    shuffle=False
    )
  experiment = tf.contrib.learn.Experiment(
    estimator=estimator,
    train_input_fn=train_input_fn,
    eval_input_fn=eval_input_fn,
    train_steps=80000,
    eval_steps=None, # evaluate runs until input is exhausted
    eval_delay_secs=180, 
    train_steps_per_iteration=1000
    )
  experiment.continuous_train_and_eval(
    continuous_eval_predicate_fn=early_stopping)  
  
  # ...

However, have in mind that continuous_eval_predicate_fn is an experimental function, so it could change at any moment.

ispirmustafa · 2017-10-23T16:11:50Z

@xiejw could you PTAL re: new 1.4 utilities.

alyaxey · 2017-11-03T12:36:34Z

Take a look at this example:
https://stackoverflow.com/questions/46326848/early-stopping-with-experiment-tensorflow

def experiment_fn(run_config):
    estimator = tf.estimator.Estimator(...)

    train_monitors = tf.contrib.learn.monitors.ValidationMonitor(
            early_stopping_metric = "loss",
    )

    return learn.Experiment(
        estimator = estimator,
        train_input_fn = train_input_fn,
        eval_input_fn = eval_input_fn,
        train_monitors = [train_monitors])

ex = learn_runner.run(
        experiment_fn = experiment_fn,
)

lam · 2018-01-21T06:26:24Z

@agniszczotka @alyaxey Using Experiment works and enables me to run validation along with training. However, I've found that the batch size is probably encoded as a constant instead of a symbolic tensor for the input node even though it is coded as a reshape node with variable batch size (i.e, tf.reshape(features["x"], [-1, ...]). As a result, in the Android code, I have to allocate an array of similar size as the batch size to store the output (i.e, fetch()).

selcouthlyBlue · 2018-02-14T03:00:52Z

Any updates on this?

NatLun091238 · 2018-03-01T09:26:41Z

@alyaxey,

train_monitors = tf.contrib.learn.monitors.ValidationMonitor(
early_stopping_metric = "loss",
)

Unfortunately, in TensorFlow 1.5.0 ValidationMonitor is not available ...
"2016-12-05, Monitors are deprecated. Please use tf.train.SessionRunHook."

NatLun091238 · 2018-03-09T19:08:33Z

@lelugom
Thank you for sharing your solution. I have implemented similar one for Dee-n-Weed model (tf.contrib.learn.DNNLinearCombinedClassifier) using tf.contrib.learn.Experiment . Model compiles well and runs on GCP Datalab instance. However, I noticed that if I run Google ML engine training on the same model, the training stalls after the first check point (does not produce any more check points while time goes). On the other hand, if I run training on GCP instance using python -m trainer.task with parameters, the model converge as it should. What could be the reason for that difference between ML engine training and ordinary training??

selcouthlyBlue · 2018-04-24T12:37:07Z

I've created a ValidationHook based on the existing LoggingTensorHook.

import tensorflow as tf


class ValidationHook(tf.train.SessionRunHook):
    def __init__(self, model_fn, params, input_fn, checkpoint_dir,
                 every_n_secs=None, every_n_steps=None):
        self._iter_count = 0
        self._estimator = tf.estimator.Estimator(
            model_fn=model_fn,
            params=params,
            model_dir=checkpoint_dir
        )
        self._input_fn = input_fn
        self._timer = tf.train.SecondOrStepTimer(every_n_secs, every_n_steps)
        self._should_trigger = False

    def begin(self):
        self._timer.reset()
        self._iter_count = 0

    def before_run(self, run_context):
        self._should_trigger = self._timer.should_trigger_for_step(self._iter_count)

    def after_run(self, run_context, run_values):
        if self._should_trigger:
            self._estimator.evaluate(
                self._input_fn
            )
            self._timer.update_last_triggered_step(self._iter_count)
        self._iter_count += 1

You can attach it as a hook whenever you run Estimator.train().

Harshini-Gadige · 2018-10-31T16:00:02Z

Is this still an issue ?

jvishnuvardhan · 2019-03-22T21:02:15Z

Closing this out since I understand it to be resolved, but please let me know if I'm mistaken. Thanks!

tensorflow-bot · 2019-03-22T21:02:16Z

Are you satisfied with the resolution of your issue?
Yes
No

aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Feb 20, 2017

martinwicke assigned sandersk Feb 21, 2017

tomwanzek mentioned this issue Mar 19, 2017

Add parameter to monitors to allow execution every_n_epochs #8361

Closed

alanyuchenhou mentioned this issue Apr 12, 2017

node embedding extraction alanyuchenhou/elephant#40

Closed

tomwanzek mentioned this issue Apr 28, 2017

[Tutorial Update] Logging and Monitoring Basics #9526

Closed

martinwicke unassigned sandersk Jun 16, 2017

martinwicke assigned ispirmustafa Jun 16, 2017

ispirmustafa assigned xiejw Nov 3, 2017

ispirmustafa removed their assignment Nov 3, 2017

tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 23, 2018

jvishnuvardhan closed this as completed Mar 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The tutorial "Logging and Monitoring Basics with tf.contrib.learn" has error. #7669

The tutorial "Logging and Monitoring Basics with tf.contrib.learn" has error. #7669

lienhua34 commented Feb 19, 2017

terrytangyuan commented Feb 19, 2017 •

edited

martinwicke commented Feb 21, 2017

sandersk commented Feb 21, 2017

martinwicke commented Feb 22, 2017 via email

alanyuchenhou commented Apr 12, 2017

ghost commented Apr 17, 2017

alanyuchenhou commented Apr 18, 2017

martinwicke commented Apr 24, 2017

kickbox commented May 15, 2017 •

edited

xxqcheers commented Jun 16, 2017

martinwicke commented Jun 16, 2017

agniszczotka commented Jul 31, 2017

AxenGitHub commented Aug 2, 2017

agniszczotka commented Aug 8, 2017 •

edited

maximedb commented Aug 29, 2017 •

edited

terrytangyuan commented Aug 29, 2017

maximedb commented Sep 8, 2017

chenfei-wu commented Sep 9, 2017

lelugom commented Oct 21, 2017 •

edited

ispirmustafa commented Oct 23, 2017

alyaxey commented Nov 3, 2017

lam commented Jan 21, 2018

selcouthlyBlue commented Feb 14, 2018

NatLun091238 commented Mar 1, 2018

NatLun091238 commented Mar 9, 2018

selcouthlyBlue commented Apr 24, 2018 •

edited

Harshini-Gadige commented Oct 31, 2018

jvishnuvardhan commented Mar 22, 2019

tensorflow-bot bot commented Mar 22, 2019

The tutorial "Logging and Monitoring Basics with tf.contrib.learn" has error. #7669

The tutorial "Logging and Monitoring Basics with tf.contrib.learn" has error. #7669

Comments

lienhua34 commented Feb 19, 2017

terrytangyuan commented Feb 19, 2017 • edited

martinwicke commented Feb 21, 2017

sandersk commented Feb 21, 2017

martinwicke commented Feb 22, 2017 via email

alanyuchenhou commented Apr 12, 2017

ghost commented Apr 17, 2017

alanyuchenhou commented Apr 18, 2017

martinwicke commented Apr 24, 2017

kickbox commented May 15, 2017 • edited

xxqcheers commented Jun 16, 2017

martinwicke commented Jun 16, 2017

agniszczotka commented Jul 31, 2017

AxenGitHub commented Aug 2, 2017

agniszczotka commented Aug 8, 2017 • edited

maximedb commented Aug 29, 2017 • edited

terrytangyuan commented Aug 29, 2017

maximedb commented Sep 8, 2017

chenfei-wu commented Sep 9, 2017

lelugom commented Oct 21, 2017 • edited

ispirmustafa commented Oct 23, 2017

alyaxey commented Nov 3, 2017

lam commented Jan 21, 2018

selcouthlyBlue commented Feb 14, 2018

NatLun091238 commented Mar 1, 2018

NatLun091238 commented Mar 9, 2018

selcouthlyBlue commented Apr 24, 2018 • edited

Harshini-Gadige commented Oct 31, 2018

jvishnuvardhan commented Mar 22, 2019

tensorflow-bot bot commented Mar 22, 2019

terrytangyuan commented Feb 19, 2017 •

edited

kickbox commented May 15, 2017 •

edited

agniszczotka commented Aug 8, 2017 •

edited

maximedb commented Aug 29, 2017 •

edited

lelugom commented Oct 21, 2017 •

edited

selcouthlyBlue commented Apr 24, 2018 •

edited