prediction different between TF Serving 1.4 and TF 1.4 #656

paragon00 · 2017-11-15T02:54:22Z

after updating our TF and Keras, and TF Serving, I'm seeing a difference in prediction values on the same model and images between TF and Keras, and Serving. I updated to TF 1.4, Keras 2.0.9 and built TF Serving from 1.4 branch (tried master too). Then prediction on some random images gives-

Keras, TensorFlow, TensorFlowServing, TrueLabel
0.294510304928, 0.294510304928, 0.306598514318, 1
0.973454713821, 0.973454713821, 0.974921882153, 1
0.0169313177466, 0.0169313177466, 0.109000883996, 0
0.969210922718, 0.969210922718, 0.964440405369, 1
0.996860027313, 0.996860027313, 0.998536705971, 1
0.996983230114, 0.996983230114, 0.994152128696, 1
0.259784668684, 0.259784668684, 0.300680160522, 0
0.989252388477, 0.989252388477, 0.97792416811, 1

ie. Keras and TF predict the same, but TF Serving gives different numbers. Its possible we didn't upgrade our TF Serving correctly (although didn't see any errors).

Is anyone else getting this? We didn't get this on TF 1.3

zmjjmz · 2017-11-20T21:28:34Z

I'm seeing something very similar at the moment, although the numerical difference is a bit more dramatic.

Notably, I have an embedding layer which has some 0 vectors (for e.g. padding / OOV). Broken down in terms of what it takes to export a keras model to TF serving:

The keras model itself produces the right output (i.e., 0)
The exported TF graph (when looked at with the inspect_checkpoint tool) has the correct values
The prediction proto response does not have the correct values (I had the model output the embeddings directly)
-- Specifically instead of 0 I see 0.00273621
-- All the dtypes check out

I'm using 1.4 here but can't confirm that I wasn't seeing this in 1.3. I guess I could try downgrading if that helps?

Forgot to mention: this is all on CPU, using the default builds (i.e. none of the available CPU optimizations are being used).

If needed I could probably put together a reproducible test case but there's a lot of moving parts :)

Also: the servable I'm testing also has ops that have boolean or int32 outputs -- all of those come out fine! However the float outputs are all funky.

zmjjmz · 2017-11-20T22:47:47Z

Further note: I tried to determine if the corruption is happening somewhere by quickly abusing a ThresholdedReLU keras layer to zero out the embeddings and then add them back in, and then compare the original embedding layer to the one with zeros added to it. If the zeros are broken w/in the graph, I'd see different numbers between them -- however it looks like they're the same.

What I did notice on a second run is that I have two embedding vectors that are all zeros (to distinguish between OOV and pad tokens -- don't worry about it) and they're coming out as different (garbage) vectors. So, e.g. the following sequence:

[1 1 1 0 0 0 0 0 0 0] should resolve to

[[ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]]

after the embedding layer, but from TF serving I get

[[ 0.00374124  0.02665842 -0.04161887  0.01480421 -0.02126383]
 [ 0.00374124  0.02665842 -0.04161887  0.01480421 -0.02126383]
 [ 0.00374124  0.02665842 -0.04161887  0.01480421 -0.02126383]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]
 [ 0.01171316 -0.03365946  0.0402073  -0.02044135  0.00470774]]

I'm working on putting together a minimal repro -- currently what I have relies on a bunch of weird custom code / keras layers that's not worth including.

I've also noticed that between versions (not between requests) the vectors change, so my previous comment about what the 0 gets changed to is inaccurate. Also, if I inspect the output of the ThresholdedReLU I do see all zeros (though sometimes -0.0, which I'm not sure what to make of).

zmjjmz · 2017-11-21T00:00:18Z

Here's a gist that should (at least, on my system) reproduce this issue:

https://gist.github.com/zmjjmz/64cf9771922aa6cf58da6233e022f056

zmjjmz · 2017-11-21T16:27:46Z

I was initially encountering this issue in a servable that used a lookup table, hence when I call add_meta_graph_and_variables I provide tensorflow.saved_model.main_op.main_op() to instantiate the table when the servable is loaded. In the test case that's unnecessary, and if I remove it the output matches up!

So, I think I can narrow down that something funky is going on in that main_op that's causing this.

zmjjmz · 2017-11-21T16:56:04Z

Ok so playing with this a bit more, I think the issue is specifically with tensorflow.python.ops.variables.global_variables_initializer.

Currently main_op() produces a grouped op that is essentially:

from tensorflow.python.ops import control_flow_ops, lookup_ops, variables        
main_op_new = control_flow_ops.group(                                            
        lookup_ops.tables_initializer(), 
        variables.local_variables_initializer(),
        variables.global_variables_initializer())

If I remove just that last initializer, this issue goes away, and I'm able to use the model as normal!

Definitely something strange going on in the global_variables_initializer, which I realize may have to do with the way I'm exporting the model (using the keras backend session, which may be the wrong way).

sukritiramesh · 2017-12-19T22:04:39Z

Thanks for reporting back @zmjjmz. Resolving since this seems export specific for now.

zmjjmz · 2017-12-19T22:18:28Z

Should I open this as a separate issue on the main tensorflow repo then?

sukritiramesh · 2017-12-19T22:19:58Z

@zmjjmz Is this to follow-up regarding the global variable initializer? If so, sure.

paragon00 · 2017-12-19T22:36:44Z

for me, the difference here disappeared in a recent TF / keras update ... as far as I could understand it it was a difference in TF prediction and TF Serving prediction that existed briefly and was fixed in a recent release

sukritiramesh closed this as completed Dec 19, 2017

zmjjmz mentioned this issue Dec 20, 2017

[Bug] Tensorflow serving loads incorrect model weights when using saved model main_op tensorflow/tensorflow#15527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prediction different between TF Serving 1.4 and TF 1.4 #656

prediction different between TF Serving 1.4 and TF 1.4 #656

paragon00 commented Nov 15, 2017 •

edited

Loading

zmjjmz commented Nov 20, 2017 •

edited

Loading

zmjjmz commented Nov 20, 2017 •

edited

Loading

zmjjmz commented Nov 21, 2017

zmjjmz commented Nov 21, 2017

zmjjmz commented Nov 21, 2017

sukritiramesh commented Dec 19, 2017

zmjjmz commented Dec 19, 2017

sukritiramesh commented Dec 19, 2017

paragon00 commented Dec 19, 2017 •

edited

Loading

prediction different between TF Serving 1.4 and TF 1.4 #656

prediction different between TF Serving 1.4 and TF 1.4 #656

Comments

paragon00 commented Nov 15, 2017 • edited Loading

zmjjmz commented Nov 20, 2017 • edited Loading

zmjjmz commented Nov 20, 2017 • edited Loading

zmjjmz commented Nov 21, 2017

zmjjmz commented Nov 21, 2017

zmjjmz commented Nov 21, 2017

sukritiramesh commented Dec 19, 2017

zmjjmz commented Dec 19, 2017

sukritiramesh commented Dec 19, 2017

paragon00 commented Dec 19, 2017 • edited Loading

paragon00 commented Nov 15, 2017 •

edited

Loading

zmjjmz commented Nov 20, 2017 •

edited

Loading

zmjjmz commented Nov 20, 2017 •

edited

Loading

paragon00 commented Dec 19, 2017 •

edited

Loading