Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Implement Bayesian LSTM layers for time-series prediction #394

Open
behdadahmadi opened this issue May 2, 2019 · 15 comments
Open

Comments

@behdadahmadi
Copy link

How can I implement and using LSTM layers for time-series prediction with Tensorflow Probability? There is no any layer for RNN Deep learning in TFP layers in tfp.layers

@alexv1247
Copy link

Exactly what I am looking for as well. I hope someone comes up with an approach.
You cann have a look at the edward python package. They have a lstm example, which is a good start.

You can have a look at this blogpost https://github.com/kyle-dorman/bayesian-neural-network-blogpost. You can implement whatever nn structure you want in this example. However the epistemic uncertainty is calculated with a mc dropout which can take forever.

@kevinykuo
Copy link

You could hook up the RNN sequence output with a (time-distributed) dense variational and then a distribution output.

@junpenglao
Copy link
Contributor

junpenglao commented May 6, 2019

+1 to @kevinykuo.
In addition, you can try combining the RNN sequence output with tfp.sts:
either using the output as a designmatrix in tfp.sts.*linearregression, or something like mu = rnn_output + sts_model.make_state_space_model and plug mu into a distribution (eg Gaussian). Would be interesting to see what works the best!

@JP-MRPhys
Copy link

JP-MRPhys commented May 6, 2019

I had a look at this too this weekend, I am thinking to start from ground up, specially, if you want to implement posterior sharpening, sonnet has a implementation of https://arxiv.org/pdf/1704.02798.pdf, https://github.com/deepmind/sonnet/blob/master/sonnet/examples/brnn_ptb.py

@alexv1247
Copy link

You could hook up the RNN sequence output with a (time-distributed) dense variational and then a distribution output.

@kevinykuo thanks for the advice. I am new to bayesian deep learning, so I am wondering if this is the same approach kyle dorman used in his blogpost I posted before?

Sent with GitHawk

@alexv1247
Copy link

+1 to @kevinykuo.
In addition, you can try combining the RNN sequence output with tfp.sts:
either using the output as a designmatrix in tfp.sts.*linearregression, or something like mu = rnn_output + sts_model.make_state_space_model and plug mu into a distribution (eg Gaussian). Would be interesting to see what works the best!

I want to build a classification model. From what I ve read in the docs about the tfp.sts models, they are made for regression tasks. So it seems rather unintuitive to use them for classification.

@behdadahmadi
Copy link
Author

@alexv1247 @kevinykuo @junpenglao @JP-MRPhys Thank you so much. I wanted to use for Stock price prediction,but I got another issue with LSTM or (ANN at all) They have delay at each prediction. Do you know how to solve it?

@cserpell
Copy link

cserpell commented Jun 4, 2020

Hi all, I see that nobody added anything in a year, but now I am trying to add weights uncertainty to an LSTM and wanted to use tensorflow probability. I was thinking on copying keras LSTM code and then change weights for variational distributions, adding corresponding losses. Would that direct approach work? Did anyone implement such recurrent layer within tensorflow probability so far? I am not sure if @kevinykuo solution addresses the weight uncertainty problem within the LSTM blocks.

@krzysztofrusek
Copy link

krzysztofrusek commented Jun 4, 2020 via email

@cserpell
Copy link

cserpell commented Jun 5, 2020

Thanks for your help. I will have a look. I have already tested monte carlo dropout and it works, though it seems hard to keep the mask if calling LSTM several times if using for generating sequential data step by step.

@cserpell
Copy link

I managed to modify LSTM code from tensorflow.python.keras.layers, replacing variable weights as posterior and prior distributions. I could not add the sampling and loss in the call method, because it is called for each recurrence step. Instead, I added the sampling process, and loss, in an auxiliary method called just after _maybe_reset_cell_dropout_mask, which clears current dropout mask, assuming that it runs at the beginning of the recurrence once, so the same sample is used for each step.

Unfortunately, during training the loss fluctuates heavily, even with very small learning rates. I have been playing with ways to describe the scaling in normal prior and posterior distributions. I will ping back when if I get it working, to share what I have learnt.

@krzysztofrusek
Copy link

Also, I have found a bug in my approach. It only works with scalar variables, so probably I messed up something with the gradient.
Any help with this would be appreciated.

@cserpell
Copy link

Following advice of #703 , I multiplied variables by small constants inside normal posteriors, and now it runs without diverging. It does not converge to anything good yet, but it seems to work. Changes in posterior and prior variables probably shift a lot the output distribution, due to the sequential (deep) nature of the network, so it is hard for it to learn. Small changes due to this constants may help.

@krzysztofrusek
Copy link

Tanks to tf community my problem is solved (tensorflow/tensorflow#40391) .

The bug was in line
loss = tf.reduce_mean(tf.math.squared_difference(y, yhat))

it should be

loss = tf.reduce_mean(tf.math.squared_difference(y, yhat[:,0]))

I tested thin on dense layers and it works quite well.
RNN is much harder to train, yet I manage to get it learning something.

@brianwa84
Copy link
Contributor

FYI @jvdillon
(We may try to add LSTM layers to tfp.experimental.nn, this is useful discussion)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants