Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for LKJCorrCholesky and point to example usage #1902

Closed
robsalomone opened this issue Jun 6, 2019 · 27 comments
Closed
Assignees
Milestone

Comments

@robsalomone
Copy link
Contributor

robsalomone commented Jun 6, 2019

Issue Description

LKJCorrCholesky seems to give the same log_prog irrespective of input. The log_prob should probably also has a type check that ensures the correct input is given.

Environment

  • MacOS 10.14.3 and Python 3.7.3
  • Pytorch 1.1
  • Pyro 0.3.3

Code Snippet

import torch as t
import pyro.distributions as pd
X = pd.LKJCorrCholesky(2,t.tensor(1.))

smp = X.sample()
C = smp @ t.transpose(smp,0,1) # seems to sample correlation matrices correctly

print(X.log_prob(t.tensor([[.1,.2,.3]]).reshape(-1,1)))
print(X.log_prob(t.tensor([[.1,0],[.1,.3]])))
print(X.log_prob(C))

Results in:

tensor(-0.6931)
tensor(-0.6931)
tensor(-0.6931)

@fehiepsi
Copy link
Member

fehiepsi commented Jun 6, 2019

@robsalomone With dimension=2, eta=1, the distribution is uniform. Hence you get that result. :)

@robsalomone
Copy link
Contributor Author

robsalomone commented Jun 6, 2019

Thanks @fehiepsi --- the LKJ is uniform over all correlation matrices of course when eta=1, but does that necessarily mean it is uniform over all real-valued L, shouldn't the Jacobian change things? Perhaps I am missing exactly what L is meant to be.

EDIT: Ah I see the case for $d = 2$ is special. Still, more generally there is something unclear about what $L$ needs to be.

If you sample from the distribution, you obtain a matrix with a value of 1. in the top left hand corner. This doesn't seem to be the same as mapping a general lower triangular matrix to the space of correlation matrices (as you would if you were reparametrizing for an LKJ prior, as I was hoping).

Also, I suspect something is up as the following:


import pyro.distributions as pd
X = pd.LKJCorrCholesky(2,t.tensor(3.))

print(X.log_prob(t.tensor([[2,0],[.1,1]])))
print(X.log_prob(t.tensor([[-.5,0],[.2,1]])))
print(X.log_prob(t.tensor([[1,0],[.1,1]])))
print(X.log_prob(t.tensor([[1,0],[.5,.1]])))

Produces:

tensor(-0.0645)
tensor(-0.0645)
tensor(-0.0645)
tensor(-9.2749)

So it would appear some elements are redundant, or it is parametrised in a way not made clear in the documentation (of course, I could just be missing something obvious).

@fehiepsi
Copy link
Member

fehiepsi commented Jun 6, 2019

A Cholesky of correlation matrix C will always have 1 at top left corner because C[1, 1] = 1. Your samples seem not be samples of LKJCorrCholesky because the norm of each row does not equal to 1. Did you get those samples from pd.LKJCorrCholesky(...).sample()? If so, then there might be a bug in the code and I will take a closer look to fix it. :)

@robsalomone
Copy link
Contributor Author

@fehiepsi they are just arbitrary inputs not samples. I think I see the issue, L needs to be the Cholesky of a correlation matrix, I had assumed L was any lower triangular matrix, and that the reparametrization went from there (which is what you need if you want to do NUTS on an LKJ prior, as a lower-triangular matrix subject to being the Cholesky of a Correlation matrix is a complicated constraint).

Long story short, I thought he code did more than it did :) That said, I think it would be worth extending it as that would be the practical use (I don't see how you can do SVI or MCMC on it otherwise?).

@fehiepsi
Copy link
Member

fehiepsi commented Jun 6, 2019

@robsalomone For an example of using LKJ, you can check this example in NumPyro. The log_prob implementation in Pyro is similar to the one in NumPyro, so I think that the example will work for Pyro too.

@robsalomone
Copy link
Contributor Author

Ok thanks, presumably something is going on in the back end that I can't see that transforms things to an unconstrained space so the MCMC works. I'll have to come back when I have more time.

@fehiepsi
Copy link
Member

fehiepsi commented Jun 6, 2019

Hmm, the transform to unconstrained space is a bit tricky to implement so there might be some differences between two versions (Pyro and NumPyro). You can get unconstrained values by using

biject_to(corr_cholesky_constraint).inv(sample)

If I recall correctly, then the unconstrained value in Pyro will be different from the one in NumPyro (up to an arrangement).

I'll have to come back when I have more time.

Sure, please let me know if you find something not clear or not working then. In the mean time, I will try to port LKJ upstream to PyTorch (but not in my priority now).

@robsalomone
Copy link
Contributor Author

robsalomone commented Jun 6, 2019

Thanks @fehiepsi, I haven't tried NumPyro yet but I'm really hoping it is more straightforward to access the back end of things. My research is mostly on inference algorithms, and I have wanted to switch to using Pyro very much, but always seem to come across problems, or bits where stuff isn't fully documented.

This is odd as all I really need is access to a models logp , grad_logp, information about what the parameters are, dimensions and constraints, and a straightforward guide to transforms. I really think if you implement these things in a clear way, and do a brief tutorial for researchers (happy to help here), you would get a lot more people onboard!

I believe the new refactoring of the HMC is a start in this direction?

@fehiepsi
Copy link
Member

fehiepsi commented Jun 6, 2019

What a wonderful suggestion! Thank you a lot for that! I do agree that many points (transforms, constraints,...) are not explained in details in docs so we’ll keep those feedbacks in mind during refactoring process.

We will address logp, params, transforms points as part of the next 0.4 release. Your feedbacks will be very valuable so I will notify you for related PRs. :)

Btw, I would recommend to use Pyro unless you want speed up your HMC.

@neerajprad
Copy link
Member

@robsalomone - We are really interested in hearing from researchers on how we can improve our interface for HMC. Right now, we are moving towards a NumPyro like interface for HMC in Pyro. NumPyro's HMC/NUTS will be much faster than Pyro's HMC for the foreseeable future, but we may lack support for some distribution classes (LKJCorrCholesky is available though). The interface supports both Pyro's model API as well as specifying a potential function by hand.

If you are only using HMC, I would actually encourage you to try out NumPyro, both because I think that will be much faster, and also so that we can get feedback on the updated interface (which will make its way into Pyro). If you need any help translating your model, please don't hesitate to reach out.

@neerajprad
Copy link
Member

Also here is the example usage of LKJCorrCholesky in Pyro - https://github.com/pyro-ppl/pyro/blob/dev/examples/lkj.py.

@neerajprad neerajprad changed the title LKJCorrCholesky log_prob not working Improve documentation for LKJCorrCholesky and point to example usage Jun 6, 2019
@neerajprad neerajprad added this to the 0.4 release milestone Jun 6, 2019
@robsalomone
Copy link
Contributor Author

@neerajprad Thanks for your reply. I work mainly in inference algorithms, so I am interested in easy coding of models and comparison with existing methods. If we could make some more accessible documentation, I could try and get more colleagues to use Pyro in their research, and I would be able to implement some other algorithms and contribute there (e.g., Stein VGD).

Regarding LKJCorrCholesky, the example usage you link seems to illustrate exactly my original issue. Namely, the LKJCorr distribution is defined on the space of Cholesky Decompositions of Correlation Matrices. Thus, for NUTS to work, there needs to be a bijection to unconstrained space. This is not implemented (as it is not simple). In that example, I suspect NUTS will yield incorrect results as the log_prob of LKJCorrCholesky allows unconstrained outputs but gives nonsensical answers (but answers nonetheless) if the input is not the Cholesky of a Correlation matrix! As NUTS samples on R^d, packing some of those values into a lower triangular matrix will not always result in the required input, but this will go unnoticed (put an assertion that L @ L.T is a correlation matrix into the log_prob function and you will see).

Thus, the way I expected log_prob to work is also the way those who made the example you mentioned it to work, but as pointed out by @fehiepsi, it does not reparametrize all the way back to unconstrained reals.

@robsalomone
Copy link
Contributor Author

To ensure it all works properly when done, we should probably compare output to Stan on a Random Effects model where the effects have LKJ correlation matrices and scale vector in quadratic form (this parametrises the covariance matrix).

@neerajprad
Copy link
Member

neerajprad commented Jun 7, 2019

I think there is either some misunderstanding, or possibly a bug in LKJCorrCholesky (@fehiepsi - let me know if it is the latter and I'm missing something).

In that example, I suspect NUTS will yield incorrect results as the log_prob of LKJCorrCholesky allows unconstrained outputs but gives nonsensical answers (but answers nonetheless) if the input is not the Cholesky of a Correlation matrix!

In your above code, if you turn on validation check, using:

X = pd.LKJCorrCholesky(2,t.tensor(1.), validate_args=True)

, you will find that it will raise a ValueError: The value argument must be within the support when you try to call log_prob on X.log_prob(t.tensor([[.1,.2,.3]])). In general, it is quite expensive to do argument validation, but if you run inference in Pyro using SVI or HMC, argument validation will be turned on by default for all distributions and this will throw an error. This is true of all PyTorch distributions where there is no guarantee that .sample or .log_prob will return NaN or throw an exception when parameters or samples to be scored are out of bounds unless validate_args=True.

Namely, the LKJCorr distribution is defined on the space of Cholesky Decompositions of Correlation Matrices.

That's correct.

Thus, for NUTS to work, there needs to be a bijection to unconstrained space. This is not implemented (as it is not simple).

This should be implemented and will happen behind the scenes. As a user, you should not have to worry about transforming to unconstrained space, running HMC and transforming the samples back to the latent sites' constrained domain. If you are curious, the part where the jacobian adjustment to the log density computation happens is here.

If you have a more complete model (even a Stan model), please feel free to share with us. I think the issue probably isn't with the LKJCorrCholesky distribution (or if there is, it is in addition), but there might be some confusion that Pyro's general modeling and HMC interface is causing.

@fehiepsi
Copy link
Member

fehiepsi commented Jun 7, 2019

Yes, I'll check it. There might be an issue at the transform.

@robsalomone
Copy link
Contributor Author

robsalomone commented Jun 7, 2019

Ah thanks @neerajprad things are becoming clear now. These things aren't clear unless one is familiar with the code base. What I was hoping was to access the log_pdf of the transformed distribution (more generally, the log_pdf of a full model on an unconstrained space).

Why bother with this such things? Ideally I want this sort of low-level access so I can implement inference algorithms that aren't in the standard Pyro framework, so I would like (1) to code a model very quick and easily either through a model function which can then produce a (transformed to be) unconstrained log joint, as well as a function mapping the parameters back to the actual space; or via simply adding (transformed to be) unconstrained log_pdfs; and then get its (unconstrained space) gradient.

As you mention, currently some aspects of this are hidden from the end user and used by HMC and SVI, however for those wishing to implement new algorithms we need an easier way to access these aspects. This is useful as some model types with special structure allow specialised inference algorithms.

@fehiepsi I think this functionality would also be great in the next version. I believe you would just need to abstract out parts of the existing HMC code!

@fehiepsi
Copy link
Member

fehiepsi commented Jun 7, 2019

Oppss, I get your idea now. Yes, those will be with the next version. I just want to let you know that most of those is possible now with

from pyro.infer.mcmc.util import initialize_model

init_params, potential_fn, transforms, _ = initialize_model(any_pyro_model)

Here potential_fn is the log density with transforms' log det adjustment. init_params are parameters in unconstrained space, and transforms are transform from constrained space to unconstrained space. To map from unconstrained space to constraint space, you can use

constrained_sample = {name: transforms[name].inv(param) for name, param in init_params.items()}

I agree that it requires an extra knowledge to do this transform job. Ideally, instead of returning transforms, we will return constrain_fn

init_params, potential_fn, constrain_fn, _ = initialize_model(any_pyro_model)

so that given a sample from unconstrained space params, we can transform it back with

constrained_sample = constrain_fn(params)

but currently, we need to keep the low level transforms for backward compatibility.

For gradient, just like any pytorch function, you can do

params = {name: p.requires_grad_(True) for name, p in params.items()}
potential_energy = potential_fn(params)
potential_energy.backward()

Or you can use the following hidden utility.

@fehiepsi
Copy link
Member

fehiepsi commented Jun 7, 2019

@neerajprad I just check for the transform and confirm that both versions (in Pyro and NumPyro) gives similar results (modulo precision issues, if any).

@neerajprad
Copy link
Member

@robsalomone - Thanks for explaining. As @fehiepsi mentioned, I think you are probably just looking for the initialize_model function which will give you most of the functionality that you need. You might want to tinker with that part of the code and modify it for your purpose. It will be exposed more prominently in the documentation prior to the release.

As is usually the case, the documentation caters more to users than to contributors, and you may have to get into the code a bit. Please feel free to ask us any questions on the forum as you navigate your way around the codebase, and let us know if there are other things that could be made more intuitive, or features that you would like to see implemented.

@robsalomone
Copy link
Contributor Author

Thanks @neerajprad and @fehiepsi, just what I needed!

@fehiepsi
Copy link
Member

fehiepsi commented Jul 3, 2019

Hi @robsalomone, @neerajprad already made the documentation about LKJ clearer in #1939 (sorry about forgetting to pin you in that PR). I'll make sure to ping you when we update the tutorial to illustrate initialize_model usage.

@robsalomone
Copy link
Contributor Author

Thanks @fehiepsi! A thought: perhaps initialize_model Should be pushed up from infer.mcmc to simply infer, as it is useful for any number of things.

@fehiepsi
Copy link
Member

fehiepsi commented Jul 3, 2019

@robsalomone Currently, initialize_model is only used in mcmc so I am not sure if it is useful for other algorithms. Maybe we can expose it in mcmc.__init__ instead of mcmc.util? Btw, I think @neerajprad will have a better idea on this request. :)

@robsalomone
Copy link
Contributor Author

@fehiepsi my thoughts were that it is the first thing any inference algorithm/diagnostic will need, e.g. Kernelized Stein Discrepancy, Stein Variational Gradient Descent - both of which aren’t mcmc. Many algorithms use the log joint and it’s gradient but aren’t mcmc. Hence the thought, just to make it easier for researchers to find.

@fehiepsi
Copy link
Member

fehiepsi commented Jul 9, 2019

@robsalomone The initialize_model utility is used in examples and tutorials to generate potential_fn in #1941. As I can see, everything works as expected, even under multiprocessing. Hope that this utility will be helpful for you in making other cool inference algorithms. :)

@neerajprad Any idea on @robsalomone suggestion to move initialize_model to infer level (e.g. infer.util) because it is useful for other non-mcmc algorithms? I have less opinion about it so I defer a response for you. :)

@neerajprad
Copy link
Member

neerajprad commented Jul 9, 2019

Even though I think initialize_model can be more generally useful, it is still implemented in a way that is somewhat specific to HMC. e.g. drawing initial parameters using our current or Stan specific way to draw initial set of parameters, the potential_fn name, etc. Other inference algorithms, in particular SVI with autoguides has its own more general initialization strategy. What I think we should do is to abstract more general utility functions in infer like log_joint, get_transforms etc. which themselves can be made use of within autoguides and HMC's initialize_model. We can delegate that refactoring to a separate PR though.

@fehiepsi
Copy link
Member

I am going to close this issue. Let's keep discussions here or open new one for not-yet-addressed issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants