Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras application - Tensor is not an element of this graph on eval after train #14356

Closed
damienpontifex opened this issue Nov 8, 2017 · 33 comments
Closed

Comments

@damienpontifex
Copy link

@damienpontifex damienpontifex commented Nov 8, 2017

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.1
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): v1.4.0-rc1-11-g130a514 1.4.0
  • Python version: 3.6.3
  • CUDA/cuDNN version: N/A CPU only
  • Exact command to reproduce:

Describe the problem

Using the estimator API and using tf.keras.applications.VGG16 and it's output for transfer learning, I get an exception raised of TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph. when the model is run a second time.

This is raised when it runs the eval step after train from tf.estimator.train_and_evaluate. See source code for model and estimator output. This also occurs if I re-run the train_and_evaluate a second time. I am running in a Jupyter notebook and my assumption about memory is that if I do a Kernel ➝ Restart it will run a training run again without the error, but cannot be run in two executions without this.

See https://github.com/damienpontifex/fastai-course/blob/master/deeplearning1/lesson1%2B3/DogsVsCats.ipynb for full notebook, but main parts for estimator model and output are below:

Source code / logs

Estimator Model

def vgg16_model_fn(features, mode, params):
    
    is_training = mode == tf.estimator.ModeKeys.TRAIN
    
    with tf.variable_scope('vgg_base'):
        # Use a pre-trained VGG16 model and drop off the top layers as we will retrain 
        # with our own dense output for our custom classes
        vgg16_base = tf.keras.applications.VGG16(
            include_top=False,
            input_shape=(224, 224, 3),
            input_tensor=features['image'],
            pooling='avg')

        # Disable training for all layers to increase speed for transfer learning
        # If new classes significantely different from ImageNet, this may be worth leaving as trainable = True
        for layer in vgg16_base.layers:
            layer.trainable = False

        x = vgg16_base.output
    
    with tf.variable_scope("fc"):
        x = tf.layers.flatten(x)
        x = tf.layers.dense(x, units=4096, activation=tf.nn.relu, trainable=is_training, name='fc1')
        x = tf.layers.dense(x, units=4096, activation=tf.nn.relu, trainable=is_training, name='fc2')
        x = tf.layers.dropout(x, rate=0.5, training=is_training)
        
    # Finally add a 2 dense layer for class predictions
    with tf.variable_scope("Prediction"):
        x = tf.layers.dense(x, units=NUM_CLASSES, trainable=is_training)
        return x

Estimator setup

dog_cat_estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    config=run_config,
    params=params
)
train_spec = tf.estimator.TrainSpec(
    input_fn=data_input_fn(train_record_filenames, num_epochs=None, batch_size=10, shuffle=True), 
    max_steps=10)
eval_spec = tf.estimator.EvalSpec(
    input_fn=data_input_fn(validation_record_filenames)
)
tf.estimator.train_and_evaluate(dog_cat_estimator, train_spec, eval_spec)

train_and_evaluate output

INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 600 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /tmp/DogsVsCats/model.ckpt-1
INFO:tensorflow:Saving checkpoints for 2 into /tmp/DogsVsCats/model.ckpt.
INFO:tensorflow:loss = 0.0, step = 2
INFO:tensorflow:Saving checkpoints for 10 into /tmp/DogsVsCats/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1063             subfeed_t = self.graph.as_graph_element(subfeed, allow_tensor=True,
-> 1064                                                     allow_operation=False)
   1065           except Exception as e:

/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in as_graph_element(self, obj, allow_tensor, allow_operation)
   3034     with self._lock:
-> 3035       return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
   3036 

/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _as_graph_element_locked(self, obj, allow_tensor, allow_operation)
   3113       if obj.graph is not self:
-> 3114         raise ValueError("Tensor %s is not an element of this graph." % obj)
   3115       return obj

ValueError: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-12-67c818ea66c5> in <module>()
----> 1 tf.estimator.train_and_evaluate(dog_cat_estimator, train_spec, eval_spec)

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in train_and_evaluate(estimator, train_spec, eval_spec)
    428       config.task_type != run_config_lib.TaskType.EVALUATOR):
    429     logging.info('Running training and evaluation locally (non-distributed).')
--> 430     executor.run_local()
    431     return
    432 

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run_local(self)
    614       # condition is satisfied (both checks use the same global_step value,
    615       # i.e., no race condition)
--> 616       metrics = evaluator.evaluate_and_export()
    617 
    618       if not metrics:

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in evaluate_and_export(self)
    749           name=self._eval_spec.name,
    750           checkpoint_path=latest_ckpt_path,
--> 751           hooks=self._eval_spec.hooks)
    752 
    753       if not eval_result:

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in evaluate(self, input_fn, steps, hooks, checkpoint_path, name)
    353         hooks=hooks,
    354         checkpoint_path=checkpoint_path,
--> 355         name=name)
    356 
    357   def _convert_eval_steps_to_hooks(self, steps):

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _evaluate_model(self, input_fn, hooks, checkpoint_path, name)
    808           input_fn, model_fn_lib.ModeKeys.EVAL)
    809       estimator_spec = self._call_model_fn(
--> 810           features, labels, model_fn_lib.ModeKeys.EVAL, self.config)
    811 
    812       if model_fn_lib.LOSS_METRIC_KEY in estimator_spec.eval_metric_ops:

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _call_model_fn(self, features, labels, mode, config)
    692     if 'config' in model_fn_args:
    693       kwargs['config'] = config
--> 694     model_fn_results = self._model_fn(features=features, **kwargs)
    695 
    696     if not isinstance(model_fn_results, model_fn_lib.EstimatorSpec):

<ipython-input-8-e251e8b8fccf> in model_fn(features, labels, mode, params)
      3     tf.summary.image('images', features['image'], max_outputs=6)
      4 
----> 5     logits = vgg16_model_fn(features, mode, params)
      6 
      7     # Dictionary with label as outcome with greatest probability

<ipython-input-7-93330b8a5aa6> in vgg16_model_fn(features, mode, params)
     10             input_shape=(224, 224, 3),
     11             input_tensor=features['image'],
---> 12             pooling='avg')
     13 
     14         # Disable training for all layers to increase speed for transfer learning

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/applications/vgg16.py in VGG16(include_top, weights, input_tensor, input_shape, pooling, classes)
    199           WEIGHTS_PATH_NO_TOP,
    200           cache_subdir='models')
--> 201     model.load_weights(weights_path)
    202     if K.backend() == 'theano':
    203       layer_utils.convert_all_kernels_in_model(model)

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in load_weights(self, filepath, by_name)
   1097       load_weights_from_hdf5_group_by_name(f, self.layers)
   1098     else:
-> 1099       load_weights_from_hdf5_group(f, self.layers)
   1100 
   1101     if hasattr(f, 'close'):

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in load_weights_from_hdf5_group(f, layers)
   1484                        str(len(weight_values)) + ' elements.')
   1485     weight_value_tuples += zip(symbolic_weights, weight_values)
-> 1486   K.batch_set_value(weight_value_tuples)
   1487 
   1488 

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in batch_set_value(tuples)
   2404       assign_ops.append(assign_op)
   2405       feed_dict[assign_placeholder] = value
-> 2406     get_session().run(assign_ops, feed_dict=feed_dict)
   2407 
   2408 

/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1065           except Exception as e:
   1066             raise TypeError('Cannot interpret feed_dict key as Tensor: '
-> 1067                             + e.args[0])
   1068 
   1069           if isinstance(subfeed_val, ops.Tensor):

TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.
@bignamehyp

This comment has been minimized.

Copy link
Member

@bignamehyp bignamehyp commented Nov 8, 2017

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

@bignamehyp bignamehyp closed this Nov 8, 2017
@bignamehyp

This comment has been minimized.

Copy link
Member

@bignamehyp bignamehyp commented Nov 8, 2017

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

@damienpontifex

This comment has been minimized.

Copy link
Author

@damienpontifex damienpontifex commented Nov 8, 2017

@bignamehyp I had assumed this was a bug as it seems to be occurring with variables setup inside ‘tf.keras.applications.VGG16’ rather than any I had setup. Thoughts?

@hsm207

This comment has been minimized.

Copy link
Contributor

@hsm207 hsm207 commented Nov 9, 2017

@bignamehyp Someone already asked a similar question on stack overflow.

The solution is to call tf.keras.backend.clear_session() after the call to train(). However, this won't work if the user wants to use train_and_evaluate() since there is no place to call clear_session().

@damienpontifex

This comment has been minimized.

Copy link
Author

@damienpontifex damienpontifex commented Nov 10, 2017

@bignamehyp does this information from @hsm207 provide any further insights? If I have to call clear_session() between runs, this would seem to be unexpected behaviour and be a bug?

Just still not sure why it's happening to provide insights on a potential solution.

@AshutoshBachchan

This comment has been minimized.

Copy link

@AshutoshBachchan AshutoshBachchan commented Apr 13, 2018

tf.Tensor 'shuffle_batch:0' shape=(64, 256, 256, 1) dtype=float32> cannot be interpreted as a Tensor

@Qmoliang

This comment has been minimized.

Copy link

@Qmoliang Qmoliang commented Apr 23, 2018

If you find this problem, try to write K.clear_session() when you secondly use your function for establishing your graph. Besides, you should reload the model and predict it randomly with a simple input. I fixed my code just like this:

uncerts_normal = get_mc_predictions(model, X_test, Y_label,
batch_size=args.batch_size) 
.var(axis=0)#.mean(axis=1)
print(uncerts_normal.shape)
uncerts_normal1 = l2_normalize(a, axis=-1)
K.clear_session() 
model = load_model('../data/model_%s.h5' % args.dataset)
print('testing model1:', model.predict(np.zeros((1, 28, 28, 1))))
uncerts_noisy = get_mc_predictions(model, X_test_noisy,Y_label,
                                   batch_size=args.batch_size).var(axis=0)
@anujgupta82

This comment has been minimized.

Copy link

@anujgupta82 anujgupta82 commented May 2, 2018

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc
@mohammedyunus009

This comment has been minimized.

Copy link

@mohammedyunus009 mohammedyunus009 commented May 16, 2018

This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
K.clear_session()

@gguibon

This comment has been minimized.

Copy link

@gguibon gguibon commented Jun 1, 2018

The solution given by @anujgupta82 worked for me. Thanks a lot !

@vitojph

This comment has been minimized.

Copy link

@vitojph vitojph commented Jun 8, 2018

Same problem here when trying to make an inference using a keras pre-trained model from a flask application. Thanks @anujgupta82 !

@nikhilkuria

This comment has been minimized.

Copy link

@nikhilkuria nikhilkuria commented Jun 19, 2018

The solution from @anujgupta82 worked for me too. But, can someone help me to understand what is going on?

krshrimali added a commit to krshrimali/simple-keras-rest-api that referenced this issue Aug 18, 2018
Solves ValueError problem because of not saving the graph just after loading the pre-trained weights of ResNet50.

Reference: tensorflow/tensorflow#14356 (comment)
@kothiyayogesh

This comment has been minimized.

Copy link

@kothiyayogesh kothiyayogesh commented Aug 21, 2018

The solution given by @Qmoliang and @MohammedYunus worked for me. Thanks :)

@uloma07

This comment has been minimized.

Copy link

@uloma07 uloma07 commented Sep 20, 2018

The solution by @anujgupta82 also worked for me. Saved me a lot of stress!

@lizekui

This comment has been minimized.

Copy link

@lizekui lizekui commented Sep 27, 2018

Wow, thanks @anujgupta82 a lot ! Really a nice answer :-)

@alanhyue

This comment has been minimized.

Copy link

@alanhyue alanhyue commented Oct 1, 2018

clear_session()

In my case, load_model() works for the first time but not afterward. If you are experiencing the same issue, you need to clear_session() after each time you load the model!

@yangzhou04

This comment has been minimized.

Copy link

@yangzhou04 yangzhou04 commented Oct 29, 2018

Thanks @anujgupta82 ,works for me too!

@Rajan316

This comment has been minimized.

Copy link

@Rajan316 Rajan316 commented Nov 5, 2018

Thanks a lot, worked for me!

@conradbm

This comment has been minimized.

Copy link

@conradbm conradbm commented Dec 10, 2018

If you find this problem, try to write K.clear_session() when you secondly use your function for establishing your graph. Besides, you should reload the model and predict it randomly with a simple input. I fixed my code just like this:

uncerts_normal = get_mc_predictions(model, X_test, Y_label,
batch_size=args.batch_size) 
.var(axis=0)#.mean(axis=1)
print(uncerts_normal.shape)
uncerts_normal1 = l2_normalize(a, axis=-1)
K.clear_session() 
model = load_model('../data/model_%s.h5' % args.dataset)
print('testing model1:', model.predict(np.zeros((1, 28, 28, 1))))
uncerts_noisy = get_mc_predictions(model, X_test_noisy,Y_label,
                                   batch_size=args.batch_size).var(axis=0)

What if the model we have trained has already been saved and we are in the loading, then predicting phase when this error occurs? Any other thoughts?

@conradbm

This comment has been minimized.

Copy link

@conradbm conradbm commented Dec 10, 2018

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

god among men. Worked.

@atakanarikan

This comment has been minimized.

Copy link

@atakanarikan atakanarikan commented Dec 15, 2018

The reason why the code from @anujgupta82 works is given in this StackOverFlow answer.

Flask uses multiple threads. The problem you are running into is because the tensorflow model is not loaded and used in the same thread. One workaround is to force tensorflow to use the gloabl default graph .

@ironmanciti

This comment has been minimized.

Copy link

@ironmanciti ironmanciti commented Dec 16, 2018

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

Thanks. I struggled the same problem for half a day and solved it as your suggestion.

@anujshah1003

This comment has been minimized.

Copy link

@anujshah1003 anujshah1003 commented Dec 21, 2018

The solution by @anujgupta82 worked for me. thanks

@rahulsinghpatel

This comment has been minimized.

Copy link

@rahulsinghpatel rahulsinghpatel commented Feb 12, 2019

The approach provided mohamedadaly is described here with an example. check this link:
https://interviewbubble.com/typeerror-cannot-interpret-feed_dict-key-as-tensor-tensor-tensor-is-not-an-element-of-this-graph/

@shaoeChen

This comment has been minimized.

Copy link

@shaoeChen shaoeChen commented Feb 15, 2019

hi, dear.
i use your function, but no work for me, no any error message, and no any reponse.

# load_keras_model.py
class LoadKerasModel:
    model = None
    graph = None

    def __init__(self):
        self.keras_resource()
        self.init_model()

    def init_model(self):
        self.graph = tf.get_default_graph()
        self.model = load_model(file_path)
        self.model.predict(np.ones((1, 1, 1, 1)))

    def keras_resource(self):
        num_cores = 4

        if os.getenv('TENSORFLOW_VERSION') == 'GPU':
            num_gpu = 1
            num_cpu = 1
        elif os.getenv('TENSORFLOW_VERSION') == 'CPU':
            num_gpu = 0
            num_cpu = 1
        else:
            raise NonResourceException()

        config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
                                inter_op_parallelism_threads=num_cores, allow_soft_placement=True,
                                device_count={'CPU': num_cpu, 'GPU': num_gpu})
        config.gpu_options.allow_growth = True

        session = tf.Session(config=config)
        K.set_session(session)

    def predict_target(selfl, img_generator):
        with self.graph.as_default():
            predict = self.model.predict_generator(
                img_generator,
                steps=len(img_generator),
                verbose=1
            )
        return predict

load_keras_model = LoadKerasModel()

my environment

python 3.5
keras 2.24
tensorflow: 1.12

my activate uwsgi command

uwsgi --http-socket 0.0.0.0:5001 --wsgi-file wsgi.py --callable app --http-enable-proxy-protocol --processes 4 --threads 2 --stats 0.0.0.0:5002

while i use flask run to activate my application, it works very well, but not work while use uwsgi.
flask is factory method to activate, while init flask app, i import load_keras_model.
i don't sure where i wrong, because no any error message, hope somebody can help me, thanks.

@ArashHosseini

This comment has been minimized.

Copy link

@ArashHosseini ArashHosseini commented Feb 22, 2019

this works for me,
@shaoeChen how is this working for you? It turns out this way does not need a clear_session call and is at the same time configuration friendly

from keras.backend.tensorflow_backend import set_session
# load_keras_model.py
class LoadKerasModel:
    model = None
    graph = None

    def __init__(self):
        config = self.keras_resource()
        self.init_model(config)

    def init_model(self, _config, *args):
        session = tf.Session(config=_config)
        self.graph = session.graph
        set_session(session)
        self.model = load_model(file_path)

    def keras_resource(self):
        num_cores = 4

        if os.getenv('TENSORFLOW_VERSION') == 'GPU':
            num_gpu = 1
            num_cpu = 1
        elif os.getenv('TENSORFLOW_VERSION') == 'CPU':
            num_gpu = 0
            num_cpu = 1
        else:
            raise NonResourceException()

        config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
                                inter_op_parallelism_threads=num_cores, allow_soft_placement=True,
                                device_count={'CPU': num_cpu, 'GPU': num_gpu})
        config.gpu_options.allow_growth = True
        
        return config

    def predict_target(self, img_generator):
        with self.graph.as_default():
            predict = self.model.predict_generator(
                img_generator,
                steps=len(img_generator),
                verbose=1
            )
        return predict

load_keras_model = LoadKerasModel()
load_keras_model.predict_target(np.ones((1, 1, 1, 1))) #img_generator
@shaoeChen

This comment has been minimized.

Copy link

@shaoeChen shaoeChen commented Feb 25, 2019

@ArashHosseini
hi dear.
i try it and get same reply, it's no response and no error. the browser is reading, reading, reading.
even i set uwsgi one process one thread as below:

uwsgi --http-socket 0.0.0.0:5001 --wsgi-file wsgi.py --callable app --http-enable-proxy-protocol --processes 1 --threads 1 --stats 0.0.0.0:5002

Now, i try use gunicron as below, and five seconds can get predict_generator response:

gunicorn --thread=2 --workers=1 wsgi:app -b 0.0.0.0:5001

It work well for me, i think i need study how to use uwsgi correctly.
thanks for your guidnace.

@ArashHosseini

This comment has been minimized.

Copy link

@ArashHosseini ArashHosseini commented Feb 25, 2019

@shaoeChen, thx for reply, i edited the code, set_session in __init__ was missing, now the GPU consumption should be significantly lower, let me know if that(gpu_config) worked in your case, thanks

@shaoeChen

This comment has been minimized.

Copy link

@shaoeChen shaoeChen commented Mar 8, 2019

@ArashHosseini , sorry to late to reply,
now i notice that GPU resource is not under my control.
Original it use 1355MB, but now it use all 1888MB as below say:
image

original gpu memory:
image

hi @ArashHosseini . i am sorry.
i think i miss some setting, now i sure the gpu memory usage is the same, as below say:
image

thanks your advice.

@digglife

This comment has been minimized.

Copy link

@digglife digglife commented Apr 15, 2019

This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
K.clear_session()

Thank you!

@joaofbsm

This comment has been minimized.

Copy link

@joaofbsm joaofbsm commented May 31, 2019

I have encountered this error in a code I was working with, and none of the above answers worked for me.

What I found as the problem was that the code had mixed uses of keras and tensorflow.keras, and using keras.backend.clear_session() instead of tensorflow.keras.backend.clear_session() broke everything after the network was trained for the first time.

@chusri

This comment has been minimized.

Copy link

@chusri chusri commented Jun 12, 2019

@anujgupta82 you save my day

MbProg added a commit to MbProg/BughouseAlphaZero that referenced this issue Jul 9, 2019
@JansonLiao

This comment has been minimized.

Copy link

@JansonLiao JansonLiao commented Jul 10, 2019

I have encountered this error in a code I was working with, and none of the above answers worked for me.

What I found as the problem was that the code had mixed uses of keras and tensorflow.keras, and using keras.backend.clear_session() instead of tensorflow.keras.backend.clear_session() broke everything after the network was trained for the first time.

thanks, I got the same problem with you, and follow your answer, I fixed this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.