How to Implement Bayesian LSTM layers for time-series prediction #394

behdadahmadi · 2019-05-02T15:00:22Z

How can I implement and using LSTM layers for time-series prediction with Tensorflow Probability? There is no any layer for RNN Deep learning in TFP layers in tfp.layers

alexv1247 · 2019-05-06T14:15:57Z

Exactly what I am looking for as well. I hope someone comes up with an approach.
You cann have a look at the edward python package. They have a lstm example, which is a good start.

You can have a look at this blogpost https://github.com/kyle-dorman/bayesian-neural-network-blogpost. You can implement whatever nn structure you want in this example. However the epistemic uncertainty is calculated with a mc dropout which can take forever.

kevinykuo · 2019-05-06T17:52:01Z

You could hook up the RNN sequence output with a (time-distributed) dense variational and then a distribution output.

junpenglao · 2019-05-06T18:47:50Z

+1 to @kevinykuo.
In addition, you can try combining the RNN sequence output with tfp.sts:
either using the output as a designmatrix in tfp.sts.*linearregression, or something like mu = rnn_output + sts_model.make_state_space_model and plug mu into a distribution (eg Gaussian). Would be interesting to see what works the best!

JP-MRPhys · 2019-05-06T21:47:14Z

I had a look at this too this weekend, I am thinking to start from ground up, specially, if you want to implement posterior sharpening, sonnet has a implementation of https://arxiv.org/pdf/1704.02798.pdf, https://github.com/deepmind/sonnet/blob/master/sonnet/examples/brnn_ptb.py

alexv1247 · 2019-05-07T09:57:46Z

You could hook up the RNN sequence output with a (time-distributed) dense variational and then a distribution output.

@kevinykuo thanks for the advice. I am new to bayesian deep learning, so I am wondering if this is the same approach kyle dorman used in his blogpost I posted before?

_{Sent with GitHawk}

alexv1247 · 2019-05-07T12:51:24Z

+1 to @kevinykuo.
In addition, you can try combining the RNN sequence output with tfp.sts:
either using the output as a designmatrix in tfp.sts.*linearregression, or something like mu = rnn_output + sts_model.make_state_space_model and plug mu into a distribution (eg Gaussian). Would be interesting to see what works the best!

I want to build a classification model. From what I ve read in the docs about the tfp.sts models, they are made for regression tasks. So it seems rather unintuitive to use them for classification.

behdadahmadi · 2019-05-07T15:23:16Z

@alexv1247 @kevinykuo @junpenglao @JP-MRPhys Thank you so much. I wanted to use for Stock price prediction,but I got another issue with LSTM or (ANN at all) They have delay at each prediction. Do you know how to solve it?

cserpell · 2020-06-04T00:22:57Z

Hi all, I see that nobody added anything in a year, but now I am trying to add weights uncertainty to an LSTM and wanted to use tensorflow probability. I was thinking on copying keras LSTM code and then change weights for variational distributions, adding corresponding losses. Would that direct approach work? Did anyone implement such recurrent layer within tensorflow probability so far? I am not sure if @kevinykuo solution addresses the weight uncertainty problem within the LSTM blocks.

krzysztofrusek · 2020-06-04T08:29:38Z

Hi, I also wanted to implement bayesian RNN and here is what I have found so far: - Recurrent dropout in vanilla Keras layers. - Edward2 has the implementation of bayesian LSTM https://github.com/google/edward2/blob/master/edward2/tensorflow/layers/recurrent.py You may use it directly or learn from this implementation. - You may use StochasticGradientLangevinDynamics to sample from the posterior. This is a pure tfp solution, I am not sure if SGLD is compatible with tf 2.x - my ugly hack, that seems to be working This a custom train loop that can make any Keras model Bayesian and fits surrogate posterior by variational inference. Note that that the gradient of posterior samples wrt posterior parameters is manually connected to gradient with respect of model's weights because tf.assign breaks the gradient flow. This is just a proof of concept that I would like to make more mature. The more elegant approach would be to use `tf.variable_creator_scope` to replace every variable in Keras model with the `tfp.experimental.nn.util.RandomVariable`. I would love to hear any comments from the tfp team about this approach. Also, I could make a pool request with more polished tfp example if you are interested in such a contribution. ``` def _make_posterior(v): n = len(v.shape) return tfd.Independent(tfd.Normal(loc=tf.Variable(v), scale=tfp.util.TransformedVariable(0.2 + tf.zeros_like(v), tfp.bijectors.Softplus())), reinterpreted_batch_ndims=n) def _make_prior(posterior): n = len(posterior.event_shape) return tfd.Independent(tfd.Normal(tf.zeros(posterior.event_shape), 3.), reinterpreted_batch_ndims=n) def fit_vi(model, data): vars = model.trainable_variables posterior = tfp.distributions.JointDistributionSequential( [ _make_posterior(v) for v in vars]) prior = tfp.distributions.JointDistributionSequential([ _make_prior(m) for m in posterior.model ]) losses = [] kls = [] opt = tf.keras.optimizers.Adam(learning_rate=0.01) @tf.function() def train(x,y): with tf.GradientTape(persistent=True) as tape: theta = posterior.sample() with tf.control_dependencies([v.assign(s) for v, s in zip(vars, theta)]): yhat = model(x) loss = tf.reduce_mean(tf.math.squared_difference(y, yhat)) kl = posterior.kl_divergence(prior)/3000. kls.append(kl) losses.append(loss) grad = tape.gradient(loss, vars) grad2 =[] for g in grad: grad2.append(g) grad2.append(g) sample_grad = tape.gradient(theta, posterior.variables) kl_grad = tape.gradient(kl, posterior.variables) final_grad = [g1 * g2 + g3 for g1, g2, g3 in zip(grad2, sample_grad, kl_grad)] opt.apply_gradients(zip(final_grad, posterior.variables)) for x, y in data: train(x,y) return posterior ``` czw., 4 cze 2020 o 02:23 Cristián Serpell <notifications@github.com> napisał(a):

…

Hi all, I see that nobody added anything in a year, but now I am trying to add weights uncertainty to an LSTM and wanted to use tensorflow probability. I was thinking on copying keras LSTM code and then change weights for variational distributions, adding corresponding losses. Would that direct approach work? Did anyone implement such recurrent layer within tensorflow probability so far? I am not sure if @kevinykuo <https://github.com/kevinykuo> solution addresses the weight uncertainty problem within the LSTM blocks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#394 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB57DISKGPHRXEH765MSULTRU3SO3ANCNFSM4HJ7FZ3A> .

-- Krzysztof Rusek

cserpell · 2020-06-05T02:25:42Z

Thanks for your help. I will have a look. I have already tested monte carlo dropout and it works, though it seems hard to keep the mask if calling LSTM several times if using for generating sequential data step by step.

cserpell · 2020-06-10T22:07:50Z

I managed to modify LSTM code from tensorflow.python.keras.layers, replacing variable weights as posterior and prior distributions. I could not add the sampling and loss in the call method, because it is called for each recurrence step. Instead, I added the sampling process, and loss, in an auxiliary method called just after _maybe_reset_cell_dropout_mask, which clears current dropout mask, assuming that it runs at the beginning of the recurrence once, so the same sample is used for each step.

Unfortunately, during training the loss fluctuates heavily, even with very small learning rates. I have been playing with ways to describe the scaling in normal prior and posterior distributions. I will ping back when if I get it working, to share what I have learnt.

krzysztofrusek · 2020-06-11T15:36:50Z

Also, I have found a bug in my approach. It only works with scalar variables, so probably I messed up something with the gradient.
Any help with this would be appreciated.

cserpell · 2020-06-11T15:51:55Z

Following advice of #703 , I multiplied variables by small constants inside normal posteriors, and now it runs without diverging. It does not converge to anything good yet, but it seems to work. Changes in posterior and prior variables probably shift a lot the output distribution, due to the sequential (deep) nature of the network, so it is hard for it to learn. Small changes due to this constants may help.

krzysztofrusek · 2020-06-12T16:50:00Z

Tanks to tf community my problem is solved (tensorflow/tensorflow#40391) .

The bug was in line
~~loss = tf.reduce_mean(tf.math.squared_difference(y, yhat))~~

it should be

loss = tf.reduce_mean(tf.math.squared_difference(y, yhat[:,0]))

I tested thin on dense layers and it works quite well.
RNN is much harder to train, yet I manage to get it learning something.

brianwa84 · 2020-08-25T15:01:53Z

FYI @jvdillon
(We may try to add LSTM layers to tfp.experimental.nn, this is useful discussion)

srvasude added keras layers labels Feb 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Implement Bayesian LSTM layers for time-series prediction #394

How to Implement Bayesian LSTM layers for time-series prediction #394

behdadahmadi commented May 2, 2019

alexv1247 commented May 6, 2019

kevinykuo commented May 6, 2019

junpenglao commented May 6, 2019 •

edited

JP-MRPhys commented May 6, 2019 •

edited

alexv1247 commented May 7, 2019

alexv1247 commented May 7, 2019

behdadahmadi commented May 7, 2019

cserpell commented Jun 4, 2020

krzysztofrusek commented Jun 4, 2020 via email

cserpell commented Jun 5, 2020

cserpell commented Jun 10, 2020

krzysztofrusek commented Jun 11, 2020

cserpell commented Jun 11, 2020

krzysztofrusek commented Jun 12, 2020

brianwa84 commented Aug 25, 2020

How to Implement Bayesian LSTM layers for time-series prediction #394

How to Implement Bayesian LSTM layers for time-series prediction #394

Comments

behdadahmadi commented May 2, 2019

alexv1247 commented May 6, 2019

kevinykuo commented May 6, 2019

junpenglao commented May 6, 2019 • edited

JP-MRPhys commented May 6, 2019 • edited

alexv1247 commented May 7, 2019

alexv1247 commented May 7, 2019

behdadahmadi commented May 7, 2019

cserpell commented Jun 4, 2020

krzysztofrusek commented Jun 4, 2020 via email

cserpell commented Jun 5, 2020

cserpell commented Jun 10, 2020

krzysztofrusek commented Jun 11, 2020

cserpell commented Jun 11, 2020

krzysztofrusek commented Jun 12, 2020

brianwa84 commented Aug 25, 2020

junpenglao commented May 6, 2019 •

edited

JP-MRPhys commented May 6, 2019 •

edited