Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Does the model condition on future time steps? #4

Closed
llucid-97 opened this issue May 31, 2022 · 2 comments
Closed

Question: Does the model condition on future time steps? #4

llucid-97 opened this issue May 31, 2022 · 2 comments

Comments

@llucid-97
Copy link

Hi, I read your paper and I found it really interesting.
I've been going through the code to understand it, port it and try it out.

I came across a section that I don't understand though:

From reading the paper I assumed that the model should only condition on information from the past to predict the future. Is this assumption correct first of all?

if so, In the convolutional encoder, you're predicting encodings for each RSSM level in the clockwork vae stack, then you're summing them over time here:

cwvae/cnns.py

Line 140 in 62dd505

layer = tf.reduce_sum(layer, axis=2)

Does this not mean that some prediction at time step X then has access to information from time step X+N (where N is some positive integer < "temporal abstraction factor" ** level), or am I misunderstanding the data flow?

@vaibhavsaxena11
Copy link
Owner

Hey, thank you for your interest in my work!

To answer your question in short: no. This is a prediction model which does not condition on future timesteps at any level of the hierarchy. I elaborate on this below.

There are two parts that comprise this model: prediction (Decoder+RNN) and inference (Encoder). While the former is used to predict future video, the latter is used to facilitate future video prediction when some frames have been observed (all during testing).

During future prediction, the observation at any time t is conditioned on the state variable at time t, which in turn could be conditioned on a hierarchical variable that was generated at some timestep <t. This generalizes at all levels of the hierarchy and makes sure that only state variables generated at timesteps <=t can propagate information to generate the observation variable at time t.

Say we observed a sequence of 10 frames and wish to generate future video. This is equivalent to saying that we need a posterior distribution over the observation variables at timesteps >10 given the first 10 observations. Our (generative) model assumption dictates that to generate the observation at t=11, it is sufficient to have a posterior representation of the state at t=11. This is exactly what we compute. The posterior of the state at t=11 can be computed using samples from the posterior at t=10 at the same level and the hierarchical variable at t<=10 (as governed by the temporal abstraction factor). Note that all information flow until now was only forward in time.

Now, to compute the posterior at timesteps <=10, at all levels, we need to make use of our "inference" model. We defined our inference model to make use of all the observations that they directly conditioned, to compute the posterior representation of the state at any level. This makes sense because the posterior belief over a variable that conditioned more than one variable should be derived from the belief over all the variables that condition on it. This is what is accomplished by the summing-over-time that you referred to in the encoder here. While the information does flow back in time here, but it only does so to compute the posterior representations of the state variables, and is of the same flavor as computing the state after observing a frame even though the state precedes the observation during actual generation.

I hope this clarifies your doubts about the model structure! Let me know if you have any more questions.

@llucid-97
Copy link
Author

Aah ok, yes that does clarify things for me. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants