-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Implement Bayesian LSTM layers for time-series prediction #394
Comments
Exactly what I am looking for as well. I hope someone comes up with an approach. You can have a look at this blogpost https://github.com/kyle-dorman/bayesian-neural-network-blogpost. You can implement whatever nn structure you want in this example. However the epistemic uncertainty is calculated with a mc dropout which can take forever. |
You could hook up the RNN sequence output with a (time-distributed) dense variational and then a distribution output. |
+1 to @kevinykuo. |
I had a look at this too this weekend, I am thinking to start from ground up, specially, if you want to implement posterior sharpening, sonnet has a implementation of https://arxiv.org/pdf/1704.02798.pdf, https://github.com/deepmind/sonnet/blob/master/sonnet/examples/brnn_ptb.py |
@kevinykuo thanks for the advice. I am new to bayesian deep learning, so I am wondering if this is the same approach kyle dorman used in his blogpost I posted before? Sent with GitHawk |
I want to build a classification model. From what I ve read in the docs about the tfp.sts models, they are made for regression tasks. So it seems rather unintuitive to use them for classification. |
@alexv1247 @kevinykuo @junpenglao @JP-MRPhys Thank you so much. I wanted to use for Stock price prediction,but I got another issue with LSTM or (ANN at all) They have delay at each prediction. Do you know how to solve it? |
Hi all, I see that nobody added anything in a year, but now I am trying to add weights uncertainty to an LSTM and wanted to use tensorflow probability. I was thinking on copying keras LSTM code and then change weights for variational distributions, adding corresponding losses. Would that direct approach work? Did anyone implement such recurrent layer within tensorflow probability so far? I am not sure if @kevinykuo solution addresses the weight uncertainty problem within the LSTM blocks. |
Hi,
I also wanted to implement bayesian RNN and here is what I have found so
far:
- Recurrent dropout in vanilla Keras layers.
- Edward2 has the implementation of bayesian LSTM
https://github.com/google/edward2/blob/master/edward2/tensorflow/layers/recurrent.py
You may use it directly or learn from this implementation.
- You may use StochasticGradientLangevinDynamics to sample from the
posterior. This is a pure tfp solution, I am not sure if SGLD is compatible
with tf 2.x
- my ugly hack, that seems to be working
This a custom train loop that can make any Keras model Bayesian and fits
surrogate posterior by variational inference.
Note that that the gradient of posterior samples wrt posterior
parameters is manually connected to gradient with respect of model's
weights because tf.assign breaks the gradient flow.
This is just a proof of concept that I would like to make more mature.
The more elegant approach would be to use `tf.variable_creator_scope` to
replace every variable in Keras model with the
`tfp.experimental.nn.util.RandomVariable`.
I would love to hear any comments from the tfp team about this approach.
Also, I could make a pool request with more polished tfp example if you are
interested in such a contribution.
```
def _make_posterior(v):
n = len(v.shape)
return tfd.Independent(tfd.Normal(loc=tf.Variable(v),
scale=tfp.util.TransformedVariable(0.2
+ tf.zeros_like(v), tfp.bijectors.Softplus())),
reinterpreted_batch_ndims=n)
def _make_prior(posterior):
n = len(posterior.event_shape)
return tfd.Independent(tfd.Normal(tf.zeros(posterior.event_shape), 3.),
reinterpreted_batch_ndims=n)
def fit_vi(model, data):
vars = model.trainable_variables
posterior = tfp.distributions.JointDistributionSequential( [
_make_posterior(v) for v in vars])
prior = tfp.distributions.JointDistributionSequential([
_make_prior(m) for m in posterior.model
])
losses = []
kls = []
opt = tf.keras.optimizers.Adam(learning_rate=0.01)
@tf.function()
def train(x,y):
with tf.GradientTape(persistent=True) as tape:
theta = posterior.sample()
with tf.control_dependencies([v.assign(s) for v, s in zip(vars,
theta)]):
yhat = model(x)
loss = tf.reduce_mean(tf.math.squared_difference(y, yhat))
kl = posterior.kl_divergence(prior)/3000.
kls.append(kl)
losses.append(loss)
grad = tape.gradient(loss, vars)
grad2 =[]
for g in grad:
grad2.append(g)
grad2.append(g)
sample_grad = tape.gradient(theta, posterior.variables)
kl_grad = tape.gradient(kl, posterior.variables)
final_grad = [g1 * g2 + g3 for g1, g2, g3 in zip(grad2,
sample_grad, kl_grad)]
opt.apply_gradients(zip(final_grad, posterior.variables))
for x, y in data:
train(x,y)
return posterior
```
czw., 4 cze 2020 o 02:23 Cristián Serpell <notifications@github.com>
napisał(a):
… Hi all, I see that nobody added anything in a year, but now I am trying to
add weights uncertainty to an LSTM and wanted to use tensorflow
probability. I was thinking on copying keras LSTM code and then change
weights for variational distributions, adding corresponding losses. Would
that direct approach work? Did anyone implement such recurrent layer within
tensorflow probability so far? I am not sure if @kevinykuo
<https://github.com/kevinykuo> solution addresses the weight uncertainty
problem within the LSTM blocks.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#394 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB57DISKGPHRXEH765MSULTRU3SO3ANCNFSM4HJ7FZ3A>
.
--
Krzysztof Rusek
|
Thanks for your help. I will have a look. I have already tested monte carlo dropout and it works, though it seems hard to keep the mask if calling LSTM several times if using for generating sequential data step by step. |
I managed to modify LSTM code from Unfortunately, during training the loss fluctuates heavily, even with very small learning rates. I have been playing with ways to describe the scaling in normal prior and posterior distributions. I will ping back when if I get it working, to share what I have learnt. |
Also, I have found a bug in my approach. It only works with scalar variables, so probably I messed up something with the gradient. |
Following advice of #703 , I multiplied variables by small constants inside normal posteriors, and now it runs without diverging. It does not converge to anything good yet, but it seems to work. Changes in posterior and prior variables probably shift a lot the output distribution, due to the sequential (deep) nature of the network, so it is hard for it to learn. Small changes due to this constants may help. |
Tanks to tf community my problem is solved (tensorflow/tensorflow#40391) . The bug was in line it should be
I tested thin on dense layers and it works quite well. |
FYI @jvdillon |
How can I implement and using LSTM layers for time-series prediction with Tensorflow Probability? There is no any layer for RNN Deep learning in TFP layers in tfp.layers
The text was updated successfully, but these errors were encountered: