Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to visualize Inception v3 graph in TensorBoard with TensorFlow 0.7.1 #1287

Closed
dgolden1 opened this issue Feb 25, 2016 · 21 comments
Closed
Assignees

Comments

@dgolden1
Copy link
Contributor

Summary

Attempting to visualize the Inception v3 graph with TensorBoard results in an empty graph (after several minutes of loading).

Update: an earlier version of this issue indicated that the progress bar hung forever, but apparently, I just didn't wait long enough.

Environment info

Operating System: OS X 10.11.3, Chrome 48.0.2564.116, Anaconda 1.2.2

If installed from binary pip package, provide:

  1. Which pip package you installed: https://storage.googleapis.com/tensorflow/mac/tensorflow-0.7.1-cp35-none-any.whl
  2. The output from python -c "import tensorflow; print(tensorflow.version)".: 0.7.1

Steps to reproduce

  1. Downloaded and un-tar the inception v3 model. The graph protobuffer is in /tmp/imagenet/classify_image_graph_def.pb.
  2. Run this code to dump the graph:
    import os
    import os.path
    import tensorflow as tf
    from tensorflow.python.platform import gfile

    INCEPTION_LOG_DIR = '/tmp/inception_v3_log'

    if not os.path.exists(INCEPTION_LOG_DIR):
        os.makedirs(INCEPTION_LOG_DIR)
    with tf.Session() as sess:
        model_filename = '/tmp/imagenet/classify_image_graph_def.pb'
        with gfile.FastGFile(model_filename, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            _ = tf.import_graph_def(graph_def, name='')
        writer = tf.train.SummaryWriter(INCEPTION_LOG_DIR, graph_def)
        writer.close()
  1. Run tensorboard: tensorboard --logdir /tmp/inception_v3_log
  2. Navigate to graphs tab at http://0.0.0.0:6006/#graphs

Expected result: the graph
Actual result: Empty graph screen (after several minutes of loading with no movement of the progress bar)

A 91 MB file (same size as the graph protobuffer) called events.out.tfevents.1456423256.[hostname] is correctly saved to the log directory, so it seems that the graph is in there somewhere.

What have you tried?

  1. Installing Python 2 pip version of TensorFlow 0.7.1 in a separate conda environment; same results.
  2. Running mnist_summaries_example; graph is shown in TensorBoard properly, so this is a problem with the Inception model
@dgolden1
Copy link
Contributor Author

Maybe related to #716

@dgolden1 dgolden1 changed the title Unable to visualize Inception v3 model in TensorBoard with TensorFlow 0.7.1 Unable to visualize Inception v3 graph in TensorBoard with TensorFlow 0.7.1 Feb 25, 2016
@teamdandelion
Copy link
Contributor

Hi @dgolden1,

Thanks for reporting and taking the time to include a clean repro. Would you mind trying this setup against the master branch? We've been doing some work in improving the pipeline for large graphs, so it might be that this is already fixed at head.

@dgolden1
Copy link
Contributor Author

dgolden1 commented Mar 7, 2016

Sorry, @danmane, I get the same results on the current master (b1bb0bb) built from source on OS X with both Anaconda Python 2.7.11 and Anaconda Python 3.5.1 (each in their own separate environments).

@ffmpbgrnn
Copy link
Contributor

I built from source on Ubuntu 14.04 and met the same issue. Any updates on this?

@jameswex
Copy link
Contributor

dsmilkov is out for the rest of the week but I will be investigating this today

@jameswex
Copy link
Contributor

From what I see locally, it seems like this was fixed in Tensorboard in commit 3212eb3. Basically, the graphdef contains huge embedded constant tensors, making the graphdef size too large for the client to handle when it is based from Tensorboard server to the client browser. That commit adds server-side filtering out of large embedded constants, making the client able to handle the served graph data.

So, building Tensorboard from scratch on master should allow visualization of inception_v3. Also, the next tagged release should also include a rebuilt Tensorboard with the fix.

@dgolden1 and @ffmpbgrnn, did you rebuild the Tensorboard frontend and backend explicitly? Perhaps rebuilding from the TF root doesn't rebuild the Tensorboard components?

@dgolden1
Copy link
Contributor Author

Thanks for the help, @jameswex. I tried building from source again, this time on Ubuntu 14.04 with the latest master (13ea3ca) using Anaconda Python 3.5.1. Results are the same: no graph is displayed.

I built via:

bazel clean
./configure  # CPU-only
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
# ...etc

I also tried explicitly building tensorboard and running it like:

bazel build tensorflow/tensorboard:tensorboard
./bazel-bin/tensorflow/tensorboard/tensorboard --logdir /tmp/inception_v3_log/

with the same result.

@jameswex
Copy link
Contributor

It turns out that just a bazel build will not fully rebuild the tensorboard front-end (just the back-end of tensorboard). If you want to manually rebuilt the tensorboard front-end, its currently a multi-step process.

I believe in addition to the bazel build of tensorboard, you should also run "gulp vulcanize" in the tensorboard directory to rebuild the front-end HTML that communicates with the tensorboard back-end (see the tensorboard README.md for dependencies for running gulp commands).

@danmane, can you confirm if there are additional steps beyond gulp vulcanize and bazel build? Thanks.

@teamdandelion
Copy link
Contributor

In general, using gulp vulcanize then bazel build will get you the latest and greatest TensorBoard..
Although now that I released a new compiled TensorBoard that is more recent than the improvements that @jameswex are describing, it should be enough to just use bazel. (on master)

@dgolden1
Copy link
Contributor Author

Some progress; I did gulp vulcanize and then the bazel build with the latest master (e4add49). As before, I'm on Ubuntu 14.04 on Amazon EC2 with Anaconda Python 3.5.1 Now, when attempting to visualize the graph (after dumping it via the same Python snippet in my original issue), I get this TensorBoard error:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/socketserver.py", line 628, in process_request_thread
    self.finish_request(request, client_address)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/socketserver.py", line 357, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/tensorboard/backend/handler.py", line 93, in __init__
    BaseHTTPServer.BaseHTTPRequestHandler.__init__(self, *args)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/socketserver.py", line 684, in __init__
    self.handle()
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/http/server.py", line 415, in handle
    self.handle_one_request()
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/http/server.py", line 403, in handle_one_request
    method()
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/tensorboard/backend/handler.py", line 454, in do_GET
    data_handlers[clean_path](query_params)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/tensorboard/backend/handler.py", line 259, in _serve_graph
    large_attrs_key)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/tensorboard/backend/process_graph.py", line 66, in prepare_graph_for_ui
    node.attr[large_attrs_key].list.s.append(str(key))
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/google/protobuf/internal/containers.py", line 251, in append
    self._values.append(self._type_checker.CheckValue(value))
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/google/protobuf/internal/type_checkers.py", line 108, in CheckValue
    raise TypeError(message)
TypeError: 'value' has type <class 'str'>, but expected one of: (<class 'bytes'>,)

The TensorBoard app shows this "Graph visualization failed" error
image

Could this be a Python 2 vs. 3 issue?

@dsmilkov
Copy link
Contributor

Was able to reproduce the issue using python3. The problem comes down to str and bytes being equivalent types in python2, but not in python3. Moreover python3 bytes requires an encoding to be specified when converting a string to bytes (protobuf uses utf-8 for encoding strings).

Fix is on the way. The commit should appear tomorrow. If you don't want to wait, a small fix that makes it work just for python3 is to replace line 66 in process_graph.py from
node.attr[large_attrs_key].list.s.append(str(key))
to
node.attr[large_attrs_key].list.s.append(bytes(key, 'utf-8'))

@dsmilkov
Copy link
Contributor

And also replace line 58 in process_graph.py from
keys = node.attr.keys()
to
keys = list(node.attr.keys())

@dgolden1
Copy link
Contributor Author

@dsmilkov, making those changes worked! Thanks!

After the change has been pushed to master, I'll test again, and close this issue if it works.

@dgolden1
Copy link
Contributor Author

I attempted to test the current master (d868f1e) but now I can't even open a session; I get this error:

>>> tf.Session()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 727, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 104, in __init__
    opts = tf_session.TF_NewSessionOptions(target=target, config=config)
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 266, in TF_NewSessionOptions
    _TF_SetTarget(opts, target)
TypeError: expected bytes, str found

Which sounds like another Python 2 vs 3 issue, unrelated to tensorboard.

I'll try separately with the commit that included the process_graph.py patch, 9c7be1c.

@jameswex
Copy link
Contributor

That is a separate python 3 tensorflow (but not tensorboard) issue that I believe is a known issue

@dgolden1
Copy link
Contributor Author

I can't test 9c7be1c either because of yet another non-tensorboard Python 2 vs 3 issue.

On 9c7be1c:

Traceback (most recent call last):
  File "blah.py", line 3, in <module>
    import tensorflow as tf
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 82, in <module>
    from tensorflow.python.training import training as train
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/python/training/training.py", line 162, in <module>
    from tensorflow.python.training.session_manager import SessionManager
  File "/home/ubuntu/anaconda3/envs/tensorflow_py3_unstable/lib/python3.5/site-packages/tensorflow/python/training/session_manager.py", line 329
    except errors.FailedPreconditionError, e:
                                         ^
SyntaxError: invalid syntax

Maybe some Python 3 unit tests would be helpful?

I'll try this again in a few days when there will hopefully be a version that works on Python 3.

@dsmilkov
Copy link
Contributor

So we have python 3 tests, but they are not fully integrated requiring us to run them manually. A change yesterday broke TensorFlow on python 3 and fixes are on the way.

@dgolden1
Copy link
Contributor Author

Understandable, @dsmilkov, thanks for the explanation!

@dgolden1
Copy link
Contributor Author

I can confirm the graph can now be visualized properly in f952246. Thanks for working on this!

@hscspring
Copy link

i've got the similar issue in python2
when i ran writer = tf.train.SummaryWriter(INCEPTION_LOG_DIR, graph_def)
each time i got a "events.out.tfevents.1456423256."

@PapaMadeleine2022
Copy link

updating tensorflow works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants