Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freeze_graph not initializing tables #8665

Closed
sseveran opened this issue Mar 23, 2017 · 36 comments
Closed

freeze_graph not initializing tables #8665

sseveran opened this issue Mar 23, 2017 · 36 comments
Assignees
Labels

Comments

@sseveran
Copy link

I am not sure if this is an actual bug or if its expected but undocumented behavior.

I have a model that uses multiple lookup tables created via string_to_index. I freeze the model like so:
bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=/tmp/tf/graph.pbtxt --input_checkpoint=/tmp/tf/model.ckpt-0 --output_graph=/tmp/ticker_classifier.pb --output_node_names=sigmoid --initializer_nodes=init_all_tables

However when the model is reloaded and I attempt to run it I get an error "Table not initialized." I get exactly the same resulting file whether I specify initializer_nodes or not. The behavior I was expecting was for the model to contain the lookup tables in a ready to use state for inference but I don't know if that is an unreasonable expectation.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

I have not seen any issues related to this. I previously posted about this here http://stackoverflow.com/questions/42916383/how-to-properly-freeze-a-tensorflow-graph-containing-a-lookuptable

Environment info

Operating System: MacOS and Linux (CentOS 7)

Installed version of CUDA and cuDNN: None

If installed from source, provide

  1. The commit hash (git rev-parse HEAD) 07bb8ea
  2. Build label: 0.4.5
    Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
    Build time: Thu Mar 16 12:19:38 2017 (1489666778)
    Build timestamp: 1489666778
    Build timestamp as int: 1489666778

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

I have been unable to make a small example but I can spend more time on it if needed.

What other attempted solutions have you tried?

The workaround is to add init_all_tables to the output_nodes and then run init_all_tables before feeding the session examples for inference. This does have the side effect of needing to distribute the source files for the tables to the same path on all nodes that was originally used for training.

@aselle
Copy link
Contributor

aselle commented Mar 28, 2017

@petewarden, do you have any insight into this?

@aselle aselle added stat:awaiting response Status - Awaiting response from author stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug and removed stat:awaiting response Status - Awaiting response from author labels Mar 28, 2017
@ngoel17
Copy link

ngoel17 commented Apr 7, 2017

Freezing_problem.zip
We have a similar problem related to saving the operations as constants while freezing using the algorithm in the attached files.
The problem can be easily replicated by trying to freeze the graphs generated by the toy example in textsum https://github.com/tensorflow/models/tree/master/textsum .

Models are trained with the following command:
bazel-bin/textsum/seq2seq_attention --mode=train --article_key=article --abstract_key=abstract --data_path=textsum/data/data --vocab_path=textsum/data/vocab --log_root=textsum/log_root --train_dir=textsum/log_root/train

Then freeze_2_textsum.py is called with the following syntax:
python freeze_2_textsum.py
Command in our case was:
python freeze_2_textsum.py --model_folder=./log_root/ --outputnodes=global_step

In this case, we are able to find the saved constants in the frozen_model.pb file.
But when we try the same syntax for the trained graph in our project, we could not find the constants in the frozen model.pb file, while the freeze_2_textsum.py script prints the log message that "13 ops were converted to constants"
This problem leads to the following error while running the session in our test script:
"Attempting to use uninitialized value model/generate_embedding_RNN_output/BiRNN/BW/BasicLSTMCell/Linear/Bias"
cmd line for test script:

python test_tf_frozen_txtsum.py

@itsmeolivia
Copy link
Contributor

@petewarden is this still an issue?

@petewarden
Copy link
Contributor

One solution to this is to look at calling explicit initialization in the freeze_graph.py script, by specifying the --initializer_nodes argument:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py#L242

You'll need to know the name of the node that initializes your tables or other data structures though.

@sseveran
Copy link
Author

I believe I tried that, but its always possible I did it wrong. Ultimately I have moved away from lookup tables since the java bindings still don't have support for passing arrays of strings. If that is added I will be interested in the issue again.

@buffxz
Copy link

buffxz commented Jul 7, 2017

seeing the same problems here. it doesn't seem like the --initializer_nodes is doing anything. @petewarden

@buffxz
Copy link

buffxz commented Jul 7, 2017

I have a node:
init_table = tf.tables_initializer(name='init_all_tables')
and when I freeze the graph, I specify --initializer_nodes=init_all_tables . is this what I need to do?

@petewarden
Copy link
Contributor

It's possible there's a bug in the way that --initializer_nodes is being called, you could add a logging line here to verify it is happening in your case:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py#L107

@buffxz
Copy link

buffxz commented Jul 19, 2017

yea I verified that the line of code is being called. but inside convert_variables_to_constants method call, the only place uses the sess is https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/graph_util_impl.py#L220 ,

  if variable_names:
    returned_variables = sess.run(variable_names)

It doesn't seems like it actually use anything from the sess about the initialized table, and I would imagine that the initialized table is not being used at all during freezing?

@buffxz
Copy link

buffxz commented Aug 2, 2017

Any update for this issue?

@buffxz
Copy link

buffxz commented Aug 7, 2017

@petewarden ^^

@buffxz
Copy link

buffxz commented Aug 20, 2017

ping. is there anyone can help to look into this bug?

@buffxz
Copy link

buffxz commented Sep 9, 2017

any updates?

@MrGeva
Copy link

MrGeva commented Oct 10, 2017

I have the same problem, I get this error:

FailedPreconditionError (see above for traceback): Table not initialized.

I am trying to store my graph as a protobuf and then load it, when I run the graph I get the error above.

However when I run the graph directly (w\o storing it to .pb first) it works fine.

with tf.gfile.GFile(frozen_file_name, "wb") as f:
    f.write(frozen_graph_def.SerializeToString())
g = load_graph(frozen_file_name)
with tf.Session(graph=g, config=utils.get_config_proto()) as sess1:
    prefix = 'prefix/'
    sess1.run(
        prefix + infer_model.iterator.initializer.name,
        feed_dict={
            prefix + infer_model.src_placeholder.name: infer_data,
            prefix + infer_model.batch_size_placeholder.name: hparams.infer_batch_size
            })

@jkiske
Copy link

jkiske commented Oct 12, 2017

Here is a minimal example of this issue:

import os

import tensorflow as tf
from tensorflow.python.framework.graph_util import convert_variables_to_constants
from tensorflow.python.ops.lookup_ops import HashTable, KeyValueTensorInitializer

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

OUTPUT_FOLDER = '/tmp'
OUTPUT_NAME = 'hash_table.pb'
OUTPUT_NAMES = ['output']


def build_graph():
    d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
    init = KeyValueTensorInitializer(d.keys(), d.values())
    hash_table = HashTable(init, default_value=-1)
    data = tf.placeholder(tf.string, (None,), name='data')
    values = hash_table.lookup(data)
    output = tf.identity(values * 2, 'output')


def freeze_graph():
    with tf.Graph().as_default() as graph:
        build_graph()

        with tf.Session(graph=graph) as sess:
            sess.run(tf.tables_initializer())
            print sess.run('output:0', feed_dict={'data:0': ['a', 'b', 'c', 'd', 'e']})

            frozen_graph = convert_variables_to_constants(sess, sess.graph_def, OUTPUT_NAMES)
            tf.train.write_graph(frozen_graph, OUTPUT_FOLDER, OUTPUT_NAME, as_text=False)


def load_frozen_graph():
    with open(os.path.join(OUTPUT_FOLDER, OUTPUT_NAME), 'rb') as f:
        output_graph_def = tf.GraphDef()
        output_graph_def.ParseFromString(f.read())

    with tf.Graph().as_default() as graph:
        tf.import_graph_def(output_graph_def, name='')
        with tf.Session(graph=graph) as sess:
            print sess.run('output:0', feed_dict={'data:0': ['a', 'b', 'c', 'd', 'e']})


if __name__ == '__main__':
    freeze_graph()
    load_frozen_graph()

Output:

[ 2  4  6  8 -2]
Converted 0 variables to const ops.
Traceback (most recent call last):
  File "/home/kiske/hashmap_test.py", line 48, in <module>
    load_frozen_graph()
  File "/home/kiske/hashmap_test.py", line 43, in load_frozen_graph
    print sess.run('output:0', feed_dict={'data:0': ['a', 'b', 'c', 'd', 'e']})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table not initialized.
	 [[Node: hash_table_Lookup = LookupTableFindV2[Tin=DT_STRING, Tout=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](hash_table, _arg_data_0_0, hash_table/Const)]]

Caused by op u'hash_table_Lookup', defined at:
  File "/home/kiske/hashmap_test.py", line 48, in <module>
    load_frozen_graph()
  File "/home/kiske/hashmap_test.py", line 41, in load_frozen_graph
    tf.import_graph_def(output_graph_def, name='')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): Table not initialized.
	 [[Node: hash_table_Lookup = LookupTableFindV2[Tin=DT_STRING, Tout=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](hash_table, _arg_data_0_0, hash_table/Const)]]

It seems like convert_variables_to_constants is stripping out init_all_tables, key_value_init, key_value_init/keys, and key_value_init/values nodes.

Any help would be appreciated.

@jkiske
Copy link

jkiske commented Oct 12, 2017

Adding init_all_tables to the list of names to export fixes this issue.

import os

import tensorflow as tf
from tensorflow.python.framework.graph_util import convert_variables_to_constants
from tensorflow.python.ops.lookup_ops import HashTable, KeyValueTensorInitializer

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

OUTPUT_FOLDER = '/tmp'
OUTPUT_NAME = 'hash_table.pb'
OUTPUT_NAMES = ['graph/output', 'init_all_tables']


def build_graph():
    d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
    init = KeyValueTensorInitializer(d.keys(), d.values())
    hash_table = HashTable(init, default_value=-1)
    data = tf.placeholder(tf.string, (None,), name='data')
    values = hash_table.lookup(data)
    output = tf.identity(values * 2, 'output')


def freeze_graph():
    with tf.Graph().as_default() as graph:
        with tf.name_scope('graph'):
            build_graph()

        with tf.Session(graph=graph) as sess:
            sess.run(tf.tables_initializer())
            print sess.run('graph/output:0', feed_dict={'graph/data:0': ['a', 'b', 'c', 'd', 'e']})
            frozen_graph = convert_variables_to_constants(sess, sess.graph_def, OUTPUT_NAMES)
            tf.train.write_graph(frozen_graph, OUTPUT_FOLDER, OUTPUT_NAME, as_text=False)


def load_frozen_graph():
    with open(os.path.join(OUTPUT_FOLDER, OUTPUT_NAME), 'rb') as f:
        output_graph_def = tf.GraphDef()
        output_graph_def.ParseFromString(f.read())

    with tf.Graph().as_default() as graph:
        tf.import_graph_def(output_graph_def, name='')
        with tf.Session(graph=graph) as sess:
            try:
                sess.run(graph.get_operation_by_name('init_all_tables'))
            except KeyError:
                pass
            print sess.run('graph/output:0', feed_dict={'graph/data:0': ['a', 'b', 'c', 'd', 'e']})


if __name__ == '__main__':
    freeze_graph()
    load_frozen_graph()

The call to extract_sub_graph inside convert_variables_to_constants prunes out this op and its descendants (keys, values) if you don't include init_all_tables in output_node_names. I don't like the idea of running an initializer op during inference and having a try/except seems hacky to me. Is there another way to do this?

@MrGeva
Copy link

MrGeva commented Oct 16, 2017

Thanks @jkiske your workaround works for me

@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and this issue has an assignee.Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assigneee: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jan 23, 2018
@tensorflowbutler
Copy link
Member

A member of the TensorFlow organization has replied after the stat:awaiting tensorflower label was applied.

@petewarden
Copy link
Contributor

Another freeze graph issue, related to #3628 and #7162. Keeping open, since there are multiple related problems, but I haven't been able to work on them. Adding in Suharsh, since he has been working in this area.

@tensorflowbutler
Copy link
Member

Nagging Assignee: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@dreamibor
Copy link

dreamibor commented Mar 22, 2018

PING! Any progress?

@xinyang
Copy link

xinyang commented Mar 22, 2018

@jkiske 's workaround worked for me!

We also built on top of it to make it work with tf.tables_initializer(), but it requires two other changes:

  • OUTPUT_NAMES needs to include the table initialization ops, which can be obtained with tf.get_collection(tf.GraphKeys.TABLE_INITIALIZERS).
  • The MetaGraph, instead of the Graph, needs to be what's exported/imported. This is because tf.tables_initializer() references the tf.GraphKeys.TABLE_INITIALIZERS collection. The Graph does not contain a collection_list, but the MetaGraph does.

So here's a solution that works for us:

import os

import tensorflow as tf
from tensorflow.python.framework.graph_util import convert_variables_to_constants
from tensorflow.python.ops.lookup_ops import HashTable, KeyValueTensorInitializer

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

OUTPUT_FOLDER = '/tmp'
OUTPUT_NAME = 'hash_table.pb'
OUTPUT_NAMES = ['graph/output']


def build_graph():
    d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
    init = KeyValueTensorInitializer(d.keys(), d.values())
    hash_table = HashTable(init, default_value=-1)
    data = tf.placeholder(tf.string, (None,), name='data')
    values = hash_table.lookup(data)
    output = tf.identity(values * 2, 'output')


def freeze_graph():
    with tf.Graph().as_default() as graph:
        with tf.name_scope('graph'):
            build_graph()

        with tf.Session(graph=graph) as sess:
            sess.run(tf.tables_initializer())
            print sess.run('graph/output:0', feed_dict={'graph/data:0': ['a', 'b', 'c', 'd', 'e']})
            for table_init_op in tf.get_collection(tf.GraphKeys.TABLE_INITIALIZERS):
                OUTPUT_NAMES.append(table_init_op.name)
            frozen_graph = convert_variables_to_constants(sess, sess.graph_def, OUTPUT_NAMES)
            tf.train.export_meta_graph(
                filename=os.path.join(OUTPUT_FOLDER, OUTPUT_NAME),
                graph_def=frozen_graph,
                collection_list=[tf.GraphKeys.TABLE_INITIALIZERS])


def load_frozen_graph():
    with tf.Graph().as_default() as graph:
        tf.train.import_meta_graph(os.path.join(OUTPUT_FOLDER, OUTPUT_NAME))
        with tf.Session(graph=graph) as sess:
            sess.run(tf.tables_initializer())
            print sess.run('graph/output:0', feed_dict={'graph/data:0': ['a', 'b', 'c', 'd', 'e']})


if __name__ == '__main__':
    freeze_graph()
    load_frozen_graph()

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

1 similar comment
@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 14 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 15 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 64 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 79 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 94 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@tensorflowbutler
Copy link
Member

Nagging Assignees @petewarden, @suharshs: It has been 109 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

@sseveran
Copy link
Author

I think the appropriate workarounds are documented here. Additionally using SavedModel with Estimators handles this correctly. I am going to close this issue for now.

@zhjunqin
Copy link
Contributor

I think the appropriate workarounds are documented here. Additionally using SavedModel with Estimators handles this correctly. I am going to close this issue for now.

I met the issue when using https://github.com/tensorflow/models/tree/master/official/wide_deep to do freeze graph and predict.
I think it is using estimator.
I need add init_all_tables to the list of names to export fixes this issue, as jkiske mentioned above.

INPUT_SAVED_MODEL_DIR=./1564303973
INPUT_CHECKPOINT=./model
OUTPUT_NODES="head/predictions/probabilities,head/Tile,init_all_tables"
python -m tensorflow.python.tools.freeze_graph  \
--input_saved_model_dir ${INPUT_SAVED_MODEL_DIR} \
--input_checkpoint ${INPUT_CHECKPOINT} \
--input_binary=false \
--output_graph=/tmp/frozen_test.pb \
--output_node_names ${OUTPUT_NODES}

@yzbdt
Copy link

yzbdt commented Mar 24, 2020

Adding init_all_tables to the list of names to export fixes this issue.

import os

import tensorflow as tf
from tensorflow.python.framework.graph_util import convert_variables_to_constants
from tensorflow.python.ops.lookup_ops import HashTable, KeyValueTensorInitializer

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

OUTPUT_FOLDER = '/tmp'
OUTPUT_NAME = 'hash_table.pb'
OUTPUT_NAMES = ['graph/output', 'init_all_tables']


def build_graph():
    d = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
    init = KeyValueTensorInitializer(d.keys(), d.values())
    hash_table = HashTable(init, default_value=-1)
    data = tf.placeholder(tf.string, (None,), name='data')
    values = hash_table.lookup(data)
    output = tf.identity(values * 2, 'output')


def freeze_graph():
    with tf.Graph().as_default() as graph:
        with tf.name_scope('graph'):
            build_graph()

        with tf.Session(graph=graph) as sess:
            sess.run(tf.tables_initializer())
            print sess.run('graph/output:0', feed_dict={'graph/data:0': ['a', 'b', 'c', 'd', 'e']})
            frozen_graph = convert_variables_to_constants(sess, sess.graph_def, OUTPUT_NAMES)
            tf.train.write_graph(frozen_graph, OUTPUT_FOLDER, OUTPUT_NAME, as_text=False)


def load_frozen_graph():
    with open(os.path.join(OUTPUT_FOLDER, OUTPUT_NAME), 'rb') as f:
        output_graph_def = tf.GraphDef()
        output_graph_def.ParseFromString(f.read())

    with tf.Graph().as_default() as graph:
        tf.import_graph_def(output_graph_def, name='')
        with tf.Session(graph=graph) as sess:
            try:
                sess.run(graph.get_operation_by_name('init_all_tables'))
            except KeyError:
                pass
            print sess.run('graph/output:0', feed_dict={'graph/data:0': ['a', 'b', 'c', 'd', 'e']})


if __name__ == '__main__':
    freeze_graph()
    load_frozen_graph()

The call to extract_sub_graph inside convert_variables_to_constants prunes out this op and its descendants (keys, values) if you don't include init_all_tables in output_node_names. I don't like the idea of running an initializer op during inference and having a try/except seems hacky to me. Is there another way to do this?

use op.run instead of session.run(tensor), throws no error
init_op = graph.get_operation_by_name('init_all_tables')
init_op.run(session=sess)

Mbompr added a commit to Mbompr/deepr that referenced this issue Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests