New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve documentation for LKJCorrCholesky and point to example usage #1902
Comments
@robsalomone With dimension=2, eta=1, the distribution is uniform. Hence you get that result. :) |
Thanks @fehiepsi --- the LKJ is uniform over all correlation matrices of course when eta=1, but does that necessarily mean it is uniform over all real-valued L, shouldn't the Jacobian change things? Perhaps I am missing exactly what L is meant to be. EDIT: Ah I see the case for If you sample from the distribution, you obtain a matrix with a value of 1. in the top left hand corner. This doesn't seem to be the same as mapping a general lower triangular matrix to the space of correlation matrices (as you would if you were reparametrizing for an LKJ prior, as I was hoping). Also, I suspect something is up as the following:
Produces:
So it would appear some elements are redundant, or it is parametrised in a way not made clear in the documentation (of course, I could just be missing something obvious). |
A Cholesky of correlation matrix C will always have |
@fehiepsi they are just arbitrary inputs not samples. I think I see the issue, L needs to be the Cholesky of a correlation matrix, I had assumed L was any lower triangular matrix, and that the reparametrization went from there (which is what you need if you want to do NUTS on an LKJ prior, as a lower-triangular matrix subject to being the Cholesky of a Correlation matrix is a complicated constraint). Long story short, I thought he code did more than it did :) That said, I think it would be worth extending it as that would be the practical use (I don't see how you can do SVI or MCMC on it otherwise?). |
@robsalomone For an example of using LKJ, you can check this example in NumPyro. The log_prob implementation in Pyro is similar to the one in NumPyro, so I think that the example will work for Pyro too. |
Ok thanks, presumably something is going on in the back end that I can't see that transforms things to an unconstrained space so the MCMC works. I'll have to come back when I have more time. |
Hmm, the transform to unconstrained space is a bit tricky to implement so there might be some differences between two versions (Pyro and NumPyro). You can get unconstrained values by using
If I recall correctly, then the unconstrained value in Pyro will be different from the one in NumPyro (up to an arrangement).
Sure, please let me know if you find something not clear or not working then. In the mean time, I will try to port LKJ upstream to PyTorch (but not in my priority now). |
Thanks @fehiepsi, I haven't tried NumPyro yet but I'm really hoping it is more straightforward to access the back end of things. My research is mostly on inference algorithms, and I have wanted to switch to using Pyro very much, but always seem to come across problems, or bits where stuff isn't fully documented. This is odd as all I really need is access to a models logp , grad_logp, information about what the parameters are, dimensions and constraints, and a straightforward guide to transforms. I really think if you implement these things in a clear way, and do a brief tutorial for researchers (happy to help here), you would get a lot more people onboard! I believe the new refactoring of the HMC is a start in this direction? |
What a wonderful suggestion! Thank you a lot for that! I do agree that many points (transforms, constraints,...) are not explained in details in docs so we’ll keep those feedbacks in mind during refactoring process. We will address logp, params, transforms points as part of the next 0.4 release. Your feedbacks will be very valuable so I will notify you for related PRs. :) Btw, I would recommend to use Pyro unless you want speed up your HMC. |
@robsalomone - We are really interested in hearing from researchers on how we can improve our interface for HMC. Right now, we are moving towards a NumPyro like interface for HMC in Pyro. NumPyro's HMC/NUTS will be much faster than Pyro's HMC for the foreseeable future, but we may lack support for some distribution classes (LKJCorrCholesky is available though). The interface supports both Pyro's model API as well as specifying a potential function by hand. If you are only using HMC, I would actually encourage you to try out NumPyro, both because I think that will be much faster, and also so that we can get feedback on the updated interface (which will make its way into Pyro). If you need any help translating your model, please don't hesitate to reach out. |
Also here is the example usage of |
@neerajprad Thanks for your reply. I work mainly in inference algorithms, so I am interested in easy coding of models and comparison with existing methods. If we could make some more accessible documentation, I could try and get more colleagues to use Pyro in their research, and I would be able to implement some other algorithms and contribute there (e.g., Stein VGD). Regarding LKJCorrCholesky, the example usage you link seems to illustrate exactly my original issue. Namely, the LKJCorr distribution is defined on the space of Cholesky Decompositions of Correlation Matrices. Thus, for NUTS to work, there needs to be a bijection to unconstrained space. This is not implemented (as it is not simple). In that example, I suspect NUTS will yield incorrect results as the log_prob of LKJCorrCholesky allows unconstrained outputs but gives nonsensical answers (but answers nonetheless) if the input is not the Cholesky of a Correlation matrix! As NUTS samples on R^d, packing some of those values into a lower triangular matrix will not always result in the required input, but this will go unnoticed (put an assertion that L @ L.T is a correlation matrix into the log_prob function and you will see). Thus, the way I expected log_prob to work is also the way those who made the example you mentioned it to work, but as pointed out by @fehiepsi, it does not reparametrize all the way back to unconstrained reals. |
To ensure it all works properly when done, we should probably compare output to Stan on a Random Effects model where the effects have LKJ correlation matrices and scale vector in quadratic form (this parametrises the covariance matrix). |
I think there is either some misunderstanding, or possibly a bug in LKJCorrCholesky (@fehiepsi - let me know if it is the latter and I'm missing something).
In your above code, if you turn on validation check, using: X = pd.LKJCorrCholesky(2,t.tensor(1.), validate_args=True) , you will find that it will raise a
That's correct.
This should be implemented and will happen behind the scenes. As a user, you should not have to worry about transforming to unconstrained space, running HMC and transforming the samples back to the latent sites' constrained domain. If you are curious, the part where the jacobian adjustment to the log density computation happens is here. If you have a more complete model (even a Stan model), please feel free to share with us. I think the issue probably isn't with the LKJCorrCholesky distribution (or if there is, it is in addition), but there might be some confusion that Pyro's general modeling and HMC interface is causing. |
Yes, I'll check it. There might be an issue at the transform. |
Ah thanks @neerajprad things are becoming clear now. These things aren't clear unless one is familiar with the code base. What I was hoping was to access the log_pdf of the transformed distribution (more generally, the log_pdf of a full model on an unconstrained space). Why bother with this such things? Ideally I want this sort of low-level access so I can implement inference algorithms that aren't in the standard Pyro framework, so I would like (1) to code a model very quick and easily either through a model function which can then produce a (transformed to be) unconstrained log joint, as well as a function mapping the parameters back to the actual space; or via simply adding (transformed to be) unconstrained log_pdfs; and then get its (unconstrained space) gradient. As you mention, currently some aspects of this are hidden from the end user and used by HMC and SVI, however for those wishing to implement new algorithms we need an easier way to access these aspects. This is useful as some model types with special structure allow specialised inference algorithms. @fehiepsi I think this functionality would also be great in the next version. I believe you would just need to abstract out parts of the existing HMC code! |
Oppss, I get your idea now. Yes, those will be with the next version. I just want to let you know that most of those is possible now with
Here
I agree that it requires an extra knowledge to do this transform job. Ideally, instead of returning
so that given a sample from unconstrained space
but currently, we need to keep the low level For gradient, just like any pytorch function, you can do
Or you can use the following hidden utility. |
@neerajprad I just check for the transform and confirm that both versions (in Pyro and NumPyro) gives similar results (modulo precision issues, if any). |
@robsalomone - Thanks for explaining. As @fehiepsi mentioned, I think you are probably just looking for the As is usually the case, the documentation caters more to users than to contributors, and you may have to get into the code a bit. Please feel free to ask us any questions on the forum as you navigate your way around the codebase, and let us know if there are other things that could be made more intuitive, or features that you would like to see implemented. |
Thanks @neerajprad and @fehiepsi, just what I needed! |
Hi @robsalomone, @neerajprad already made the documentation about LKJ clearer in #1939 (sorry about forgetting to pin you in that PR). I'll make sure to ping you when we update the tutorial to illustrate |
Thanks @fehiepsi! A thought: perhaps |
@robsalomone Currently, |
@fehiepsi my thoughts were that it is the first thing any inference algorithm/diagnostic will need, e.g. Kernelized Stein Discrepancy, Stein Variational Gradient Descent - both of which aren’t mcmc. Many algorithms use the log joint and it’s gradient but aren’t mcmc. Hence the thought, just to make it easier for researchers to find. |
@robsalomone The @neerajprad Any idea on @robsalomone suggestion to move |
Even though I think |
I am going to close this issue. Let's keep discussions here or open new one for not-yet-addressed issues. |
Issue Description
LKJCorrCholesky seems to give the same log_prog irrespective of input. The log_prob should probably also has a type check that ensures the correct input is given.
Environment
Code Snippet
Results in:
The text was updated successfully, but these errors were encountered: