-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding the mixture in MDN (pi) #11
Comments
Hi @Caselles During inference (testing), I first sample 1 of the k possible mixtures (from a categorical distribution defined using the πᵢ's). After picking one of the mixtures, I then sample a Gaussian z vector using the μ and σ vectors for that mixture. I have also done the same thing in sketch-rnn demo. |
I understand, and this raises my concern. Let me explain: Let NB_GAUSSIAN be the number of Gaussians in the MDN. Let Z_SIZE be the dimension of the latent vector produced by the VAE. This means that the shape of π, for your implementation is (NB_GAUSSIAN). π is a distribution from which you can sample an integer ranging from 1 to NB_GAUSSIAN. Once you obtain this integer, let's call it j, you sample from the j-th gaussian to obtain the output. In various implementations, such as this one in PyTorch and this one in Keras, the shape of π is (NB_GAUSSIAN, Z_SIZE). It is composed of a list of Z_SIZE distributions over [1, .. , NB_GAUSSIAN] from which you can sample a component from the NB_GAUSSIAN gaussians. Here is a simple example : What are your thoughts on that ? It seems weird to me that there is no fixed answer for this. |
Hi @Caselles Thanks for the detailed explanation, I have reread your original message, and your second clear explanation and I understand your question now. Apologies, and please ignore the previous response as that does not properly address your question! I have followed the approach in (2), and modelled each individual dimension of Z (say of 64 dim) as a mixture of 5 Gaussians in the MDN-RNN. I think this is a reasonable choice if our modelling assumption is that each dimension of Z is independent. Please refer to the code for DoomRNN, particularly for https://github.com/hardmaru/WorldModelsExperiments/blob/master/doomrnn/doomrnn.py#L352 I would love to hear your thoughts as to whether (1) is more preferable to (2). In my view (2) might be more expressive. If you want to experiment with (1), then the training of the MDN-RNN, and the loss functions, will need to be modified to accompany for that. Thanks! |
Thank you for your clear answer ! I have no prior preference on method (1) or (2). I was just wondering about it because it seems that in the original paper by Bishop about Mixture Density Networks (PDF link), he only considers the case where the predicted output is scalar. Here the output is a vector so we are left with a choice to model each component as a mixture of k gaussians (method (1)) or to model the whole vector as k multivariate gaussians (method (2)). Indeed, it seems that method (2) is intuitively more expressive. I wonder if method (1) works. It could be that (1) works just as good and is simpler ? I will be experimenting with (1). Thanks for pointing out that the training and the loss should be modified. I'll let you know if I have any interesting results. Thanks again for your work and especially for answering questions here, on Reddit, publishing the code. The effort is much appreciated ! |
I wonder how the mixture is done in MDN at test time (so when we want to dream).
http://blog.otoro.net/2015/11/24/mixture-density-networks-with-tensorflow/ : Here we have an example of a MDN with k Gaussians of parameters (mu_k, sigma_k) that are 1-dimensional. E.g mu_k and sigma_k are scalars. Since we can only mix these K Gaussians by taking one of the K possible pair of parameters (mu_k, sigma_k), the shape of Pi is Pi = (Pi_1, ..., Pi_k) with each Pi_j being the (scalar) probability of selecting the j-th Gaussian.
In the case of the MDN in World Models, the k Gaussians of parameters (mu_k, sigma_k) are n-dimensional (with the same dimension as z, the encoded feature coming from the VAE). Hence given one sample there a choice for mixture, you can:
For 1), there are k Gaussians to choose from, while for 2) there are actually n*k Gaussians to choose from.
Which one is right and why ?
In available implementations, I see that people seem to use 2) (since the shape of Pi is (NB_GAUSSIANS, SIZE_FEATURE_VAE)).
[I also asked my question in the comments on Reddit : https://www.reddit.com/r/MachineLearning/comments/8poc3z/r_blog_post_on_world_models_for_sonic/ . If I get an answer there, I will close the issue and reference the answer there.]
The text was updated successfully, but these errors were encountered: