Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The tutorial "Logging and Monitoring Basics with tf.contrib.learn" has error. #7669

Closed
lienhua34 opened this issue Feb 19, 2017 · 29 comments
Closed
Assignees
Labels

Comments

@lienhua34
Copy link

When I used the code snippet in the section "Customizing the Evaluation Metrics with MetricSpec" of the tutorial Logging and Monitoring Basics with tf.contrib.learn. the code snippet is

validation_metrics = {
    "accuracy":
        tf.contrib.learn.metric_spec.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_accuracy,
            prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
            CLASSES),
    "precision":
        tf.contrib.learn.metric_spec.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_precision,
            prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
            CLASSES),
    "recall":
        tf.contrib.learn.metric_spec.MetricSpec(
            metric_fn=tf.contrib.metrics.streaming_recall,
            prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
            CLASSES)
}

My tensorflow version is r1.0 . When I run my program, it print the following error:

$ python iris.py 
Traceback (most recent call last):
  File "iris.py", line 72, in <module>
    tf.app.run()
  File "/Library/Python/2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "iris.py", line 24, in main
    "accuracy": tf.contrib.learn.metric_spec.MetricSpec(
AttributeError: 'module' object has no attribute 'metric_spec'

I found that the class tf.contrib.learn.metric_spec.MetricSpec has been renamed to tf.contrib.learn.MetricSpec.

The class tf.contrib.learn.prediction_key.PredictionKey also has been renamed to tf.contrib.learn.PredictionKey.

@terrytangyuan
Copy link
Member

terrytangyuan commented Feb 19, 2017

@lienhua34 yes it's correct. The interface has been sealed recently. Welcome to submit a pull request! @martinwicke Does the team have any plan to rewrite Monitor tutorial by Hooks?

@aselle aselle added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Feb 20, 2017
@martinwicke
Copy link
Member

Sanders, didn't you already do that?

@sandersk
Copy link
Contributor

No, I did update this tutorial back in December, but haven't yet switched to use SessionRunHook, as I was waiting on an equivalent canned hook for ValidationMonitor. That's not yet available, correct?

In the meantime, for an example of applying a SessionRunHook to an Estimator, you can refer to the tf.layers tutorial (https://www.tensorflow.org/tutorials/layers), which covers how to configure a LoggingTensorHook.

@martinwicke
Copy link
Member

martinwicke commented Feb 22, 2017 via email

@alanyuchenhou
Copy link

I'm also following this tutorial and having problems with it. I'm using the latest 1.0.1 release.

Is there any working example for these monitors CaptureVariable, PrintTensor, ValidationMonitor?

@ghost
Copy link

ghost commented Apr 17, 2017

I have the same problem,and it does not work after changing tf.contrib.learn.metric_spec.MetricSpec/tf.contrib.learn.prediction_key.PredictionKey to tf.contrib.learn.MetricSpec/tf.contrib.learn.PredictionKey.Any one could help?

@alanyuchenhou
Copy link

@terrytangyuan @sandersk
Are these monitors CaptureVariable, PrintTensor, ValidationMonitor already deprecated by LoggingTensorHook?

@martinwicke
Copy link
Member

Yes. All Monitors are deprecated. Not all of them have a direct equivalent, but there should be hooks for the main use cases. Except ValidationMonitor, as of today.

@kickbox
Copy link

kickbox commented May 15, 2017

I wanted to learn, TF and started off with tf.contrib.learn from https://www.tensorflow.org/get_started/tflearn. However I got stuck in the same problem with ValidationMonitors. I understand that they are depreciated now. I don't have the head to go through the new "hooks" tutorial. for visualizing through tensor board yet. Is there a simple tutorial using iris dataset as a continuation from ~/get_started/fflearn ?

@xxqcheers
Copy link

when I run the " iris_monitors.py"
errors as follows:

tf.contrib.learn.prediction_key.PredictionKey.CLASSES),

AttributeError: module 'tensorflow.contrib.learn' has no attribute 'prediction_key'

@martinwicke
Copy link
Member

@ispirmustafa FYI

We should be fixing this as part of our tutorials rewrite for core estimators.

@agniszczotka
Copy link

is there any update regarding ValidationMonitor as hook? The documentation seems to not be updated

@AxenGitHub
Copy link

I am in the same boat as "agniszczotka".
I have successfully used a SummarySaverHook to write some stats to file and display them on tensorboard, but i am wondering how i can evaluate the accuracy improvement through training. Should i put an estimator.evaluate with different "step" parameters to evaluate the accuracy in different moments/checkpoints?
In specific, i am trying to replicate this: https://www.tensorflow.org/versions/r1.3/get_started/monitors#evaluating_every_n_steps

@agniszczotka
Copy link

agniszczotka commented Aug 8, 2017

AxenGitHub I managed to run validation through training by using experiment
see a doc here: https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment


experiment = Experiment(estimator=estimator, train_input_fn=training_input_fn,
                               eval_input_fn=eval_input_fn, eval_steps=None, min_eval_frequency=1)
experiment.train_and_evaluate()

I am not sure how effective is it yet, but it did a job.
could you please share your solution for implementing validation monitor with the hooks. I made a question at stackoverflow https://stackoverflow.com/questions/45417502/validation-during-training-of-estimator?noredirect=1#comment77798445_45417502

@maximedb
Copy link

maximedb commented Aug 29, 2017

@agniszczotka Thanks for your help. When I implement your suggestion, I get the following error:
File ".../anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 253, in train if (config.environment != run_config.Environment.LOCAL and
AttributeError: 'RunConfig' object has no attribute 'environment'
Any idea on how to get around it?

@terrytangyuan
Copy link
Member

@maximedb I fixed that in #11385. It would be great if someone could take a look at that one.

@maximedb
Copy link

maximedb commented Sep 8, 2017

It was resolved by adding the following lines (see here)

os.environ['TF_CONFIG'] = json.dumps({'environment': 'local'})
config = tf.contrib.learn.RunConfig()
estimator = tf.estimator.Estimator(..., config = config)

@chenfei-wu
Copy link

How can I use early_stopping in environment?

@lelugom
Copy link

lelugom commented Oct 21, 2017

@Moymix you can implement early stopping by using the continuous_eval_predicate_fn, available in tf.contrib.learn.Experiment.continuous_eval_on_train_data. For instance, let's take a batch size of 10 and early stop count of 15. Modifying the example at TF Layers tutorial for a bigger dataset, the code would look like this:

BATCH_SIZE  = 10
EARLY_STOP_COUNT = 15

# Model function
def model_fn(features, labels, mode):
  # ...
  eval_metric_ops = { "accuracy"  : accuracy}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

# Early stopping function
accuracy_reg = np.zeros(EARLY_STOP_COUNT)
def early_stopping(eval_results):
  # None argument for the first evaluation
  if not eval_results: 
    return True
  
  accuracy_reg[0 : EARLY_STOP_COUNT - 1] = accuracy_reg[1 : EARLY_STOP_COUNT]
  accuracy_reg[EARLY_STOP_COUNT - 1] = eval_results["accuracy"]
  counts = 0
  for i in range(0, EARLY_STOP_COUNT - 1):
    if accuracy_reg[i + 1] <= accuracy_reg[i]:
      counts += 1
  if counts == EARLY_STOP_COUNT - 1:
    print("\nEarly stopping: %s \n" % accuracy_reg)
    return False
    
  return True

# Main function
def main(unused_argv):
  #...
  estimator = tf.estimator.Estimator(
  #...
  # Train the model 
  train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"data": train_data},
    y=train_labels,
    batch_size=BATCH_SIZE,
    num_epochs=None, # Continue until training steps are finished
    shuffle=True
    )
  eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"data": validate_data},
    y=validate_labels,
    batch_size=BATCH_SIZE,
    num_epochs=1, 
    shuffle=False
    )
  experiment = tf.contrib.learn.Experiment(
    estimator=estimator,
    train_input_fn=train_input_fn,
    eval_input_fn=eval_input_fn,
    train_steps=80000,
    eval_steps=None, # evaluate runs until input is exhausted
    eval_delay_secs=180, 
    train_steps_per_iteration=1000
    )
  experiment.continuous_train_and_eval(
    continuous_eval_predicate_fn=early_stopping)  
  
  # ...

However, have in mind that continuous_eval_predicate_fn is an experimental function, so it could change at any moment.

@ispirmustafa
Copy link
Contributor

@xiejw could you PTAL re: new 1.4 utilities.

@alyaxey
Copy link

alyaxey commented Nov 3, 2017

Take a look at this example:
https://stackoverflow.com/questions/46326848/early-stopping-with-experiment-tensorflow

def experiment_fn(run_config):
    estimator = tf.estimator.Estimator(...)

    train_monitors = tf.contrib.learn.monitors.ValidationMonitor(
            early_stopping_metric = "loss",
    )

    return learn.Experiment(
        estimator = estimator,
        train_input_fn = train_input_fn,
        eval_input_fn = eval_input_fn,
        train_monitors = [train_monitors])

ex = learn_runner.run(
        experiment_fn = experiment_fn,
)

@ispirmustafa ispirmustafa removed their assignment Nov 3, 2017
@lam
Copy link

lam commented Jan 21, 2018

@agniszczotka @alyaxey Using Experiment works and enables me to run validation along with training. However, I've found that the batch size is probably encoded as a constant instead of a symbolic tensor for the input node even though it is coded as a reshape node with variable batch size (i.e, tf.reshape(features["x"], [-1, ...]). As a result, in the Android code, I have to allocate an array of similar size as the batch size to store the output (i.e, fetch()).

screen shot 2018-01-20 at 10 15 16 pm

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 23, 2018
@selcouthlyBlue
Copy link
Contributor

Any updates on this?

@NatLun091238
Copy link

@alyaxey,

train_monitors = tf.contrib.learn.monitors.ValidationMonitor(
early_stopping_metric = "loss",
)

Unfortunately, in TensorFlow 1.5.0 ValidationMonitor is not available ...
"2016-12-05, Monitors are deprecated. Please use tf.train.SessionRunHook."

@NatLun091238
Copy link

@lelugom
Thank you for sharing your solution. I have implemented similar one for Dee-n-Weed model (tf.contrib.learn.DNNLinearCombinedClassifier) using tf.contrib.learn.Experiment . Model compiles well and runs on GCP Datalab instance. However, I noticed that if I run Google ML engine training on the same model, the training stalls after the first check point (does not produce any more check points while time goes). On the other hand, if I run training on GCP instance using python -m trainer.task with parameters, the model converge as it should. What could be the reason for that difference between ML engine training and ordinary training??

@selcouthlyBlue
Copy link
Contributor

selcouthlyBlue commented Apr 24, 2018

I've created a ValidationHook based on the existing LoggingTensorHook.

import tensorflow as tf


class ValidationHook(tf.train.SessionRunHook):
    def __init__(self, model_fn, params, input_fn, checkpoint_dir,
                 every_n_secs=None, every_n_steps=None):
        self._iter_count = 0
        self._estimator = tf.estimator.Estimator(
            model_fn=model_fn,
            params=params,
            model_dir=checkpoint_dir
        )
        self._input_fn = input_fn
        self._timer = tf.train.SecondOrStepTimer(every_n_secs, every_n_steps)
        self._should_trigger = False

    def begin(self):
        self._timer.reset()
        self._iter_count = 0

    def before_run(self, run_context):
        self._should_trigger = self._timer.should_trigger_for_step(self._iter_count)

    def after_run(self, run_context, run_values):
        if self._should_trigger:
            self._estimator.evaluate(
                self._input_fn
            )
            self._timer.update_last_triggered_step(self._iter_count)
        self._iter_count += 1

You can attach it as a hook whenever you run Estimator.train().

@Harshini-Gadige
Copy link

Is this still an issue ?

@jvishnuvardhan
Copy link
Contributor

Closing this out since I understand it to be resolved, but please let me know if I'm mistaken. Thanks!

@tensorflow-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests