Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

example of loading a tensorflow model #34

Open
machuck opened this issue Mar 8, 2018 · 16 comments
Open

example of loading a tensorflow model #34

machuck opened this issue Mar 8, 2018 · 16 comments
Assignees
Labels
enhancement New feature or request lucid.modelzoo

Comments

@machuck
Copy link

machuck commented Mar 8, 2018

Hi, do we have an example of loading a tensorflow model some where in the docs already? If not yet, can you provide one? thanks

@ludwigschubert ludwigschubert added enhancement New feature or request lucid.modelzoo labels Mar 11, 2018
@ludwigschubert ludwigschubert self-assigned this Mar 11, 2018
@ludwigschubert
Copy link
Contributor

We do not yet, and we will add one. Thank you for your patience!

@machuck
Copy link
Author

machuck commented Mar 13, 2018

thanks, looking forward to!

@JegZheng
Copy link

Thanks for reply. I just notice that model should be from moralex@'s modelzoo when using optvis. Is there any differences between the models in modelzoo and our trained models? BTW, could you please provide a link for moralex@'s modelzoo? Thanks.

@ludwigschubert
Copy link
Contributor

Hey everyone (@machuck, @Wursthub, @emptyewer, @JegZheng)!
Writing up a guide has proven more challenging than I expected. If you want to help out and have the time you can take a look at this WIP notebook and leave any comments you have:

https://drive.google.com/file/d/1PPzeZi5sBN2YRlBmKsdvZPbfYtZI-pHl

Thanks for your continued patience! :-)

@tul-urte
Copy link

tul-urte commented May 9, 2018

Hi Ludwig,

I am successfully training using TensorFlow 1.8.0 on the standard SSD MobileNet V1 model config file, and exporting graphs using the object_detection script "export_inference_graph.py". I'm working in continuous mode whereby I collect and classify new images every day, feed back some of them to the Ground Truth data set, and then train again (once a week).

So I'm very keen and interested to use Lucid to explore the output models and to inform the "feedback-to-ground-truth" step in a more scientific way. Obviously it's amazingly cool too.

So I jumped in, tried the tutorial notebook, and it worked perfectly. But when I switched to one of my own exported graphs I got errors. What did I expect - I've managed to get this far without really understanding what's going on with Tensorflow graphs.

Eventually I understood that the "input" and "layer" choices are critical. There are so many different errors that occur if these are poor choices, largely due to the "non-backpropagatability" of operations in the graph between the designated "input" and "layer". I was massively helped by the example on this site: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/14_DeepDream.ipynb

What works for me is:

# Lucid Example

input_name = 'Preprocessor/sub'
layer = "FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/Relu6"

# therefore pick some index between 0..127
index = 105


import os
import lucid.optvis.objectives as objectives
import lucid.optvis.param as param
import lucid.optvis.render as render
import lucid.optvis.transform as transform
from lucid.modelzoo.vision_base import Model

# locations
graph_dir = "/XENOPHON/nutsack/graphs/nutsack_11_v8_510x510_02_150k/"
graph_name = "inference_graph.pb"
labels_name = "object-detection.pbtxt"

graph_path = os.path.join( graph_dir, graph_name )
labels_path = os.path.join( graph_dir, labels_name )

# zoo-like model
class SSDMobilenetV1(Model):
    def __init__(self, graph_path, labels_path, input_name, image_shape=None, image_value_range=None ):        
        self.model_path = graph_path
        self.labels_path = labels_path
        self.input_name = input_name
        self.image_shape = image_shape
        self.image_value_range = image_value_range
        super().__init__()

scale = 300
image_shape = [ 510, 510, 3 ]
image_value_range = (-1, 1)

param_f = lambda: param.image( scale, fft=True, decorrelate=True )

def get_objective( layer, index=0 ):
    return objectives.channel( layer , index )

transforms = [
    transform.pad(16),
    transform.jitter(32),
    transform.random_scale([n/100. for n in range(80, 120)]),
#     transform.random_rotate(range(-10,10) + range(-5,5) + 10*range(-2,2)),
    transform.jitter(2)
]

model = SSDMobilenetV1( graph_path, labels_path, input_name, image_shape, image_value_range )
model.load_graphdef()

_ = render.render_vis( model, get_objective( layer, index ), transforms=transforms, param_f=param_f, thresholds=(scale,) )

@ludwigschubert
Copy link
Contributor

ludwigschubert commented May 11, 2018

Hi @tul-urte,
thanks for your patience and I'm sorry you're running into this much trouble. When we started writing lucid there was not yet a standard for how to save graphs, so there's much confusion around the topic.

Regarding your concrete issue:

Model classes use the attribute input_name to know which tensor to feed the input image in your model. That name comes from your original model definition or may be set in an export script, etc. From the name of the files you're referencing I'm guessing it originates at this line! Looks like the right name, so what's wrong?

In your current code you create a new placeholder that's not connected at all to your original model's loaded graph. That's why you get the "No gradients…" Value Error. You shouldn't have to modify the create_input function. Instead, ensure you know the name of your model's input and use it in your Model subclass.

Use the code in the linked notebook (for node in graph_def.node: …) to look at the names of the nodes in your graph. Maybe you're just missing a prefix.

Hope this helped! Let me know whether it did or did not. :-)

@qihongl
Copy link

qihongl commented May 19, 2018

Thanks for this thread! It would be awesome if this package interfaces with generic Tensorflow models easily.

@ilamanov
Copy link

ilamanov commented Jun 1, 2018

Hi @ludwigschubert ,

I am getting a
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables ["<tf.Variable 'Variable:0' shape=(1, 28, 28, 1) dtype=float32_ref>"] and loss Tensor("Neg:0", shape=(), dtype=float32)
when I try to visualize a channel using
vis = render.render_vis(nasnet, objectives.channel("conv2/weights", 2), param_f).

You mentioned that this can be caused by the placeholder being disconnected from the graph. However, I did not modify the create_input function and if you look at the graph https://drive.google.com/file/d/1_6e3YhDzwlYNPScImDGZeAXBTmgANb5V/view?usp=sharing
the placeholder called "Variable" (lower left) seems to be connected to the loss named "Neg" (lower right) through node (weights) which comes from conv2/weights. Can you suggest a fix for this please? Thank you!

Modelzoo metadata that I used:

class Net(Model):
    model_path = 'pre_trained_model/frozen_model.pb'
    image_shape = [28, 28, 1]
    image_value_range = (0, 1)
    input_name = 'Reshape'

@ludwigschubert
Copy link
Contributor

@namnov you use input_name = 'Reshape', which seems to result in that disconnected Placeholder -> Reshape sub-graph within "import". input_name should reference the place you feed in images/want to get images out of. I can't know if the lower half of your graph is "just" preprocessing… if it isn't, than "Variable" seems like the input_name you want.

Please let me know if this helped, I may have a follow up question.

@ilamanov
Copy link

ilamanov commented Jun 3, 2018

@ludwigschubert Thank you! Turns out the problem was that I was using objectives.channel("conv2/weights", 2). You can't maximize the activation of weights, they are constant. What I meant to do initially is objectives.channel("conv2/Conv2D", 2) - maximize the activation of convolution operation. That fixed the problem.

Some comments:
The input name Reshape was correct. It seems like tf.import_graph_def imports unused parts of the graph as well but it disconnects those unused parts. This is why I had that disconnected Placeholder -> Reshape. This behavior can also be observed in the model from the example "Importing a Graph into modelzoo.ipynb".

Also, Variable node was created by lucid in lowres_tensor which explains why it has operations like random crop on it.

@ThanatchaPanpairoj
Copy link

ThanatchaPanpairoj commented Jul 20, 2018

UPDATE: got the code to run by changing input_name to 'Preprocessor/sub' after looking at @tul-urte 's post. THANK YOU. I think the problem may have been selecting an input node that was too early in the graph and including preprocessor operations with gradients that couldn't be calculated.

Hi @ludwigschubert,

I trained an unmodified version of ssdmobilenet_v1 from tensorflow's object detection API models/research/object_detection and I have been trying to visualize the activation of neuron groups in one of the feature extraction layers. I started with this code:

class SSDMobilenet_v1(Model):
  model_path = '/home/thanatcha/object_recognition/models/model/trained/frozen_inference_graph.pb'
  labels_path = '/home/thanatcha/object_recognition/data/classes.txt'
  image_shape = [640, 480, 3]
  image_value_range = (0, 255)
  input_name = 'ToFloat:0'

where frozen_inference_graph.pb was created with the tensorflow object detection API's built in graph export function. models/research/object_detection/export_inference_graph.py
I tried running:

img = load("/home/thanatcha/object_recognition/uncluttered+cluttered/image_rgb/rgb_raw_0001.png")
neuron_groups(img, "FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Relu6", 6)

but ended up getting:
(vis.py contains the neuron_groups function copied from the notebook)

Traceback (most recent call last):
  File "vis.py", line 19, in <module>
    model.load_graphdef()
  File "vis.py", line 44, in neuron_groups
    group_icons = render.render_vis(model, obj, param_f, verbose=False)[-1]
  File "/home/thanatcha/lucid/lucid/optvis/render.py", line 94, in render_vis
    relu_gradient_override)
  File "/home/thanatcha/lucid/lucid/optvis/render.py", line 183, in make_vis_T
    vis_op = optimizer.minimize(-loss, global_step=global_step)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 399, in minimize
    grad_loss=grad_loss)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 511, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 532, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 591, in _GradientsHelper
    to_ops, from_ops, colocate_gradients_with_ops)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 200, in _PendingCount
    between_op_list, between_ops, colocate_gradients_with_ops)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1427, in MaybeCreateControlFlowState
    loop_state.AddWhileContext(op, between_op_list, between_ops)
  File "/home/thanatcha/lucid/env6/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1233, in AddWhileContext
    outer_forward_ctxt = forward_ctxt.outer_context
AttributeError: 'NoneType' object has no attribute 'outer_context'

I looked into this and I think it has something to do with the graph_def containing while operations which require a "control flow context" to calculate gradients.
A specific op that cause this error is Preprocessor/map/while/Exit1.
From the error message, it appears that forward_ctxt is None.
forward_ctxt comes from tensorflow/tensorflow/python/ops/control_flow_ops.py : _GetWhileContext which calls op._get_control_flow_context() which returns None.
I reran the colab example with inceptionv1 and found that the control_flow_context of all the ops in the graph is also None. This doesn't seem to be a problem for inceptionv1 because the graph def doesn't contain 'while' operations that require a control flow context.

So I'm thinking that this means the graph_def created by models/research/object_detection/export_inference_graph.py does not contain enough information to visualize ssdmobilenetv_1. I am following the notebook on "Importing a Graph into modelzoo.ipynb" and I encountered an error after running:

python env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py \
  --input_graph=/home/thanatcha/object_recognition/log/july5_sim_on_real/trained/frozen_inference_graph.pb \
  --input_checkpoint=/home/thanatcha/object_recognition/log/july5_sim_on_real/trained/model.ckpt \
  --input_binary=true --output_graph=./mobilenetv1_graphdef_frozen.pb.modelzoo \
  --output_node_names=detection_classes
Traceback (most recent call last):
  File "env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py", line 382, in <module>
    run_main()
  File "env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py", line 379, in run_main
    app.run(main=my_main, argv=[sys.argv[0]] + unparsed)
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py", line 378, in <lambda>
    my_main = lambda unused_args: main(unused_args, flags)
  File "env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py", line 272, in main
    flags.saved_model_tags, checkpoint_version)
  File "env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py", line 254, in freeze_graph
    checkpoint_version=checkpoint_version)
  File "env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py", line 128, in freeze_graph_with_def_protos
    var_list=var_list, write_version=checkpoint_version)
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1284, in __init__
    self.build()
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1296, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1333, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
    saveables = self._ValidateAndSliceInputs(names_to_saveables)
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 666, in _ValidateAndSliceInputs
    for converted_saveable_object in self.SaveableObjectsForOp(op, name):
  File "/home/thanatcha/lucid/env7/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 634, in SaveableObjectsForOp
    variable)
TypeError: names_to_saveables must be a dict mapping string names to Tensors/Variables. Not a variable: Tensor("BoxPredictor_0/BoxEncodingPredictor/biases:0", shape=(12,), dtype=float32)

I managed to get pass this by using the nonbinary graph def file (pbtxt instead of pb):

python env7/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py \
  --input_graph=/home/thanatcha/object_recognition/log/july5_sim_on_real/trained/inference_graph.pbtxt \
  --input_checkpoint=/home/thanatcha/object_recognition/log/july5_sim_on_real/trained/model.ckpt \
  --input_binary=false --output_graph=./mobilenetv1_graphdef_frozen.pb.modelzoo \
  --output_node_names=detection_classes

Modifying the model_path to /home/thanatcha/lucid/mobilenetv1_graphdef_frozen.pb.modelzoo and running the neuron_groups function again results in the same AttributeError: 'NoneType' object has no attribute 'outer_context' error as with before

If you have any suggestions or comments, they would be really appreciated. Sorry for such a long post but I thought more details would be better.

@CasperN
Copy link

CasperN commented Aug 8, 2018

Hi, I'm trying to work through your importing a model tutorial and put my keras trained autoencoder through Lucid to study the encoded representations but I'm stuck on the part of your tutorial below

python /usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py \
  --input_graph=graph_def.pb\
  --input_checkpoint=ckpt \
  --output_graph=output_graphdef.pb.modelzoo \
  --output_node_names="conv2d_3/Relu:0"

The traceback is

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 382, in <module>
    run_main()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 379, in run_main
    app.run(main=my_main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 378, in <lambda>
    my_main = lambda unused_args: main(unused_args, flags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 272, in main
    flags.saved_model_tags, checkpoint_version)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 254, in freeze_graph
    checkpoint_version=checkpoint_version)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 128, in freeze_graph_with_def_protos
    var_list=var_list, write_version=checkpoint_version)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1284, in __init__
    self.build()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1296, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1333, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
    saveables = self._ValidateAndSliceInputs(names_to_saveables)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 666, in _ValidateAndSliceInputs
    for converted_saveable_object in self.SaveableObjectsForOp(op, name):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 640, in SaveableObjectsForOp
    variable, "", name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 122, in __init__
    self.handle_op = var.op.inputs[0]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2125, in __getitem__
    return self._inputs[i]
IndexError: list index out of range

Perhaps relevant is how I transferred the model from keras format to tensorflow format

m = tf.keras.models.load_model(path.join(FLAGS.model_dir, "model.h5"))

saver = tf.train.Saver()
sess = tf.keras.backend.get_session()
saver.save(sess, path.join(FLAGS.model_dir, "ckpt"))

tf.train.write_graph(sess.graph_def, FLAGS.model_dir, "model.GraphDef")

@ricardobarroslourenco
Copy link

@ludwigschubert any updates on how to add a Keras/TF model to Modelzoo? Also, is lucid already supporting multichannel images (> 3)? I'm working with @CasperN on that CAE model, and we definitely want to use Lucid on it :)

@johnknelsonintific
Copy link

@CasperN did you ever find a solution to this? I am also trying to freeze a model I've compiled and trained via Keras and am receiving the same:

...
return self._inputs[i]
IndexError: list index out of range

Error when freezing via freeze_graph.

@jacky22043
Copy link

jacky22043 commented Dec 4, 2018

Hi, I'm trying to work through your importing a model tutorial and put my keras trained autoencoder through Lucid to study the encoded representations but I'm stuck on the part of your tutorial below

python /usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py \
  --input_graph=graph_def.pb\
  --input_checkpoint=ckpt \
  --output_graph=output_graphdef.pb.modelzoo \
  --output_node_names="conv2d_3/Relu:0"

The traceback is

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 382, in <module>
    run_main()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 379, in run_main
    app.run(main=my_main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 378, in <lambda>
    my_main = lambda unused_args: main(unused_args, flags)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 272, in main
    flags.saved_model_tags, checkpoint_version)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 254, in freeze_graph
    checkpoint_version=checkpoint_version)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/tools/freeze_graph.py", line 128, in freeze_graph_with_def_protos
    var_list=var_list, write_version=checkpoint_version)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1284, in __init__
    self.build()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1296, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1333, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 759, in _build_internal
    saveables = self._ValidateAndSliceInputs(names_to_saveables)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 666, in _ValidateAndSliceInputs
    for converted_saveable_object in self.SaveableObjectsForOp(op, name):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 640, in SaveableObjectsForOp
    variable, "", name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 122, in __init__
    self.handle_op = var.op.inputs[0]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2125, in __getitem__
    return self._inputs[i]
IndexError: list index out of range

Perhaps relevant is how I transferred the model from keras format to tensorflow format

m = tf.keras.models.load_model(path.join(FLAGS.model_dir, "model.h5"))

saver = tf.train.Saver()
sess = tf.keras.backend.get_session()
saver.save(sess, path.join(FLAGS.model_dir, "ckpt"))

tf.train.write_graph(sess.graph_def, FLAGS.model_dir, "model.GraphDef")

@CasperN @johnknelsonintific Hi I have the smae problem. Do you have any solution now? Thank

@CasperN
Copy link

CasperN commented Dec 4, 2018

@ricardobarroslourenco I think you solved this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request lucid.modelzoo
Projects
None yet
Development

No branches or pull requests