Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precondition Error when is_training is set to false #17

Closed
TomRoussel opened this issue Aug 3, 2017 · 9 comments
Closed

Precondition Error when is_training is set to false #17

TomRoussel opened this issue Aug 3, 2017 · 9 comments

Comments

@TomRoussel
Copy link

I noticed that when the depth test graph is being build, the is_training argument for disp_net is not set to False. Won't this negatively affect the test performance, as the batch normalization won't be configured properly?

When setting this argument to True, an exception is raised. (Related to batch norm)

FailedPreconditionError: Attempting to use uninitialized value depth_net/upcnv3/BatchNorm/moving_mean
	 [[Node: depth_net/upcnv3/BatchNorm/moving_mean/read = Identity[T=DT_FLOAT, _class=["loc:@depth_net/upcnv3/BatchNorm/moving_mean"], _device="/job:localhost/replica:0/task:0/gpu:0"](depth_net/upcnv3/BatchNorm/moving_mean)]]
	 [[Node: depth_prediction/truediv/_131 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_459_depth_prediction/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I get this when using the model that was provided in the "download_model.sh" script

@tinghuiz
Copy link
Owner

tinghuiz commented Aug 7, 2017

This is due to a bug in the training code as the saver is only defined to save the trainable_variables that do not include the moving mean/variance for batch norm. I am planning to re-train the model sometime with the proper batch norm configuration.

@tinghuiz tinghuiz closed this as completed Aug 7, 2017
@zhenheny
Copy link

Hi, Have you tried training with moving mean/variance restored while testing? I tried save everything while training, and restore bn parameters when testing, but got much worse results.

@tinghuiz
Copy link
Owner

What were your batchnorm hyperparameters? The tensorflow default 'decay' for batchnorm (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/batch_norm) seems too high from my preliminary experiments. I will update the code with proper batchnorm configuration soon (most likely within a week).

@zhenheny
Copy link

@tinghuiz Thank you for the response. I have used the default slim parameters (same as in your code). With default setting, the decay is 0.999.

@tinghuiz
Copy link
Owner

From some online discussion of the batch_norm layer, decay of 0.999 is not desirable for relatively small-scale problems (i.e. problems that don't require millions of training steps). Can you try a smaller decay such as 0.9 or 0.95 and see if it helps?

@zhenheny
Copy link

I will try. One more question about bn is that for train_op, only trainable_variables are fed into the optimizer. My reading from the documents is that bn parameters are not in the trainable_variable list but in global_variable list. Is the bn mean and variance changing if train_op only applies on trainable_vars?

@tinghuiz
Copy link
Owner

tinghuiz commented Aug 22, 2017

Good point. You should replace it with something like self.train_op = slim.learning.create_train_op(total_loss, optim)

@tinghuiz
Copy link
Owner

I have removed batch_norm altogether in the latest update.

@offbye
Copy link

offbye commented Dec 6, 2018

how to save moving_mean and moving_variebles to trainable_variables ? have solved ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants