-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tensorflow probability examples #698
Update tensorflow probability examples #698
Conversation
…rk with TF2/Keras API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this contribution! Examples are really important and the TF2 switch has made a lot of ours outdated (while we've been mostly focused on updating core TFP to work with TF2), so this is awesome and much appreciated.
Sorry for being slow to get back to you --- it's been a slow time due to the Christmas / New Year's holidays. This week might still be a bit slow, but I think enough of us are back that we should be able to move forward.
The code generally looks great. I've left a few mostly stylistic comments requesting some minor changes, but overall this is a big improvement on the status quo; it'll be great to get it checked in. Maybe @jburnim can also take a quick look as a second approver?
tensorflow_probability/examples/keras_bayesian_neural_network.py
Outdated
Show resolved
Hide resolved
tensorflow_probability/examples/keras_bayesian_neural_network.py
Outdated
Show resolved
Hide resolved
tensorflow_probability/examples/keras_bayesian_neural_network.py
Outdated
Show resolved
Hide resolved
tensorflow_probability/examples/keras_bayesian_neural_network.py
Outdated
Show resolved
Hide resolved
@davmre No worries about the delay, just had some spare time and thought it'd be good to give it a go. Thanks for the comments, I have addressed these and pushed the new changes. Because I have now renamed the files the git log might be a bit different, hopefully it will not be a massive issue. Let me know if there is anything else you would like me to adjust. I will try and update some more examples in the coming weeks. |
for epoch in range(FLAGS.num_epochs): | ||
epoch_accuracy = [] | ||
for step, (batch_x, batch_y) in enumerate(train_seq): | ||
# Eager mode returns a Tensor objest and not a scalar | ||
# value for the loss, therefore only the mean accuracy | ||
# is displayed. | ||
batch_accuracy = model.train_on_batch( | ||
batch_x, batch_y)[1] | ||
epoch_accuracy.append(batch_accuracy) | ||
|
||
if step % 100 == 0: | ||
print('Epoch: {}, Batch index: {}, Accuracy: {:.3f}'.format( | ||
epoch, step, np.mean(epoch_accuracy))) | ||
|
||
if (step+1) % FLAGS.viz_steps == 0: | ||
# Compute log prob of heldout set by averaging draws from the model: | ||
# p(heldout | train) = int_model p(heldout|model) p(model|train) | ||
# ~= 1/n * sum_{i=1}^n p(heldout | model_i) | ||
# where model_i is a draw from the posterior p(model|train). | ||
print(" ... Running monte carlo inference") | ||
probs = np.asarray([model.predict_generator(heldout_seq, verbose=1) | ||
for _ in range(FLAGS.num_monte_carlo)]) | ||
mean_probs = np.mean(probs, axis=0) | ||
heldout_log_prob = np.mean(np.log(mean_probs)) | ||
print(" ... Held-out nats: {:.3f}".format(heldout_log_prob)) | ||
|
||
if HAS_SEABORN: | ||
names = [layer.name for layer in model.layers | ||
if 'flipout' in layer.name] | ||
qm_vals = [layer.kernel_posterior.mean() | ||
for layer in model.layers | ||
if 'flipout' in layer.name] | ||
qs_vals = [layer.kernel_posterior.stddev() | ||
for layer in model.layers | ||
if 'flipout' in layer.name] | ||
plot_weight_posteriors(names, qm_vals, qs_vals, | ||
fname=os.path.join( | ||
FLAGS.model_dir, | ||
"epoch{}_step{:05d}_weights.png".format( | ||
epoch, step))) | ||
plot_heldout_prediction(heldout_seq.images, probs, | ||
fname=os.path.join( | ||
FLAGS.model_dir, | ||
"epoch{}_step{}_pred.png".format( | ||
epoch, step)), | ||
title="mean heldout logprob {:.2f}" | ||
.format(heldout_log_prob)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't use Keras' fit
method? Is it because you have faced the issue #620? The code still looks unnecessarily verbose. I had actually updated this example to TF2, but I hadn't opened a pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for using the train_on_batch
instead of the fit
was because I am trying to preserve the operations of the original example, as I think these add some valuable insight on how bayesian networks are trained and their behaviour regarding uncertainty. I thought that this was the easier way to implement those instead of using some custom Keras callbacks, which I think would be more difficult to follow. When I first submitted the PR this example was working fine, still works for version 2.0. But since version 2.1 I am also getting the problem you described in issue #620.
@davmre Is there any way you might be able to point me to a starting point for dealing with issue #620. I am making the naive assumption that the fix should not be too complicated, since as @nbro says the problem appears to be caused by the Convolution2DFlipout layers (specifically the kernel divergence) and not the DenseFlipout layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with you. I think that, as a rule of thumb, people should use fit
and callbacks are perfectly fine and intuitive. If you decide to use fit
, you can solve that issue by following the instructions here: tensorflow/tensorflow#33729 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nbro I am not particularly fond of this solution because I can similarly solve the issue by disabling eager execution. In both cases the underlying cause is still not treated. However, I will make the changes you suggest (both using the fit
function and callbacks) if @davmre also agrees with that implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In both cases the underlying cause is still not treated.", because this is a bug in TensorFlow! Hopefully, someone is already taking care of this issue. experimental_run_tf_function
, in fact, exists because there's an experimental feature related to distributed training that may not work in all cases (see #519 (comment)).
…hon.eager.core._SymbolicException in Conv2DFlipout
@davmre do you want to follow-up & import this, or need any more changes? |
|
@nbro I am testing this script and it is converging for me. Can you let me know what happens when you run it? As I stated before I am happy to make any changes that the reviewers @brianwa84 @davmre suggest. |
@Pyrsos What accuracy do you obtain after the first epoch? And what do you mean by converging? Do you mean that the loss decreases? |
@nbro Test set accuracy after the first epoch is above 95%. And, yes when I mentioned script is converging I was referring to the model loss decreasing (sorry should have been clearer). |
@Pyrsos I executed part of your code, and, as you say, the accuracy is "decent" at the end of the first epoch. I have an example where the accuracy remains at 10% throughout training (even though the loss decreases), but this was due to the fact that I was not using the softmax function in the last layer. Anyway, given your concerns about the compatibility with the original example (and as I said above), you should use a loss function that uses the |
@Pyrsos How do the distributions of the weights (mean and std) of each layer look like after, say, the 1st, 5th and 10th epoch? Have you noticed any particular pattern? For example, the means do not change much and actually tend to concentrate around zero, while the variances tend to increase (spread out)? |
@nbro I am glad that the code worked for you. With regard to the weights, I have noticed the same behaviour you are describing, but have not gone much deeper in experimenting with this. It is definitely an interesting research area to check, especially when considering different prior/posterior distribution settings. With regard to the loss function, I think that the terms
Also, for the probability/tensorflow_probability/python/distributions/categorical.py Lines 275 to 283 in 356cfdd
In that case the sparse version is used, which explains why the inputs had to be integer values instead of one-hot (categorical) values. Notice also that the function returns the negative value, which is why the original example had to compensate by doing: probability/tensorflow_probability/examples/bayesian_neural_network.py Lines 261 to 263 in 07b6fe4
The keras So I think that is what is happening, but please correct me if I have this wrong and there is something else going on @brianwa84 @davmre |
@Pyrsos In any case, TFP provides abstractions for distributions, which should be used (in an example that shows the usage of TFP), no? I know that NLL is equivalent to CE. |
@Pyrsos I noticed that in your updated example https://github.com/Pyrsos/probability/blob/update_examples/tensorflow_probability/examples/bayesian_neural_network.py, you are using |
@brianwa84, @davmre Given that there are two models under the folder |
@nbro Yes, the error you are describing also appears in |
@Pyrsos No, you are confused: I hadn't suggested using So, I will ask you again: do you get that error with And, yes, as I had already reported in several Github issues, yes, the issue is only apparently related to Bayesian convolution layers. |
@nbro If you set |
@Pyrsos So, why did you create this pull request if, initially, you were not using |
@nbro Original version of the script was using v2.0 where |
@Pyrsos It's strange. You're the second person saying that |
Sorry for the long delay on this; it fell through the cracks partly because I was on vacation for a while. We're putting this through internal review now; hopefully it will go in within the next couple of days. A couple of notes:
|
PiperOrigin-RevId: 294488343
@davmre The merged examples now have both options |
Thank you very much @davmre ! I learned a lot from this process, and I am looking forward to contributing again! |
As addressed on issue #607, the tensorflow probability examples currently use the 'compat' tensorflow subpackage for TF2 compatibility. In this PR I have updated two of the examples ('logistic_regression ' and 'bayesian_neural_network.py') to be compatible with the TF2 paradigms, using the Keras high-level API. I have kept the same functionality for plots and retained the model architectures. On the other hand I have replaced the 'tf.data' input pipeline functions with the Keras sequence class, as these can be used very easily with the 'model.(fit|evaluate|predict)_generator' functions. Also on the 'keras_bayesian_neural_network.py' I have replaced the mnist datasets path (which appears to no longer be in that location) with the keras.datasets.mnist path.
One further issue I have noticed is when eager execution is enabled, the Keras model.fit_generator operation will return the loss metric as a Tensor object, while the accuracy will be returned as a scalar (float value). For this reason the 'keras_bayesian_neural_network.py' uses the 'model.train_on_batch' operation instead (which also allows for plotting of weights/images and performing monte carlo sampling on the validation set while training).