New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observing deterministically transformed output #568

Closed
riedelcastro opened this Issue Nov 12, 2017 · 15 comments

Comments

Projects
None yet
8 participants
@riedelcastro
Copy link

riedelcastro commented Nov 12, 2017

Thanks for a great library!

I have the following model (which samples alright):

def add_one_or_two(guess):
    init = 2
    choice = pyro.sample("choice", dist.categorical, ps=guess,vs=[False,True])
    if choice:
        outcome = init + 1
    else:
        outcome = init + 2
    return outcome

Now I would like to get a marginal for "choice", after having observed 4 as the output of add_one_or_two. It is not 100% clear to me from the Conditioning on Models intro how this would look like in this case. Somehow, conditioning seems to be linked to the outputs of sample statements but for me, the output of the model is just a deterministic transformation of the choice sample. How should I go about this?

@fritzo fritzo added the question label Nov 12, 2017

@fritzo

This comment has been minimized.

Copy link
Member

fritzo commented Nov 12, 2017

Pyro only allows conditioning on sample sites, not on arbitrary deterministic functions of sample sites. This is because conditioning is implemented as a transformation from pyro.sample to pyro.observe, and observations can only be at sample sites. Pyro has some support for invertible transformations in TransformedDistribution.

You may be able to workaround this using a Delta distribution

def add_one_or_two(guess):
    ...
    outcome = pyro.sample("outcome", dist.delta, outcome)
    return outcome

(but I haven't tried this!)

@dustinvtran

This comment has been minimized.

Copy link
Collaborator

dustinvtran commented Nov 12, 2017

That's an interesting limitation. Most non-trivial implicit models, including physical simulators and GANs, are non-invertible function outputs. How might poutines support this natively? (It would also be nice to support on the algorithm side; doesn't it refute the "universal" claim?)

@riedelcastro

This comment has been minimized.

Copy link

riedelcastro commented Nov 12, 2017

Great, thanks! Will give this a go. Don't know how I missed the delta---I tried a categorical with one element in the support, but that didn't fly.

@eb8680

This comment has been minimized.

Copy link
Collaborator

eb8680 commented Nov 12, 2017

@riedelcastro I'd suggest using a Bernoulli with a very small positive probability of your constraint being false to avoid infinities.

@dustinvtran you can use this Delta/Bernoulli pattern to express conditioning on any Boolean proposition being true, so conditioning in Pyro is in principle as expressive as conditioning in Church, but of course this isn't very efficient because there's no extra information inference algorithms can exploit to improve their chances of satisfying the constraint. You're right that many interesting models don't have tractable densities, so implementing less naive versions of this pattern with ABC likelihoods or discriminators is high on our roadmap.

@dustinvtran

This comment has been minimized.

Copy link
Collaborator

dustinvtran commented Nov 12, 2017

Can you provide a snippet of how that works? For example, say eps ~ N(0, 1) for a 1-dimensional noise, followed by x = tanh(eps * W), where W is a 1 x 2 trainable matrix. We observe one 2-dimensional data point x. I don't totally follow how to infer eps or estimate W.

@dustinvtran

This comment has been minimized.

Copy link
Collaborator

dustinvtran commented Nov 12, 2017

Oh, got it. In this example, you mean literally doing inference over the joint p(eps, x) = N(eps | 0, 1) I[ x == tanh(eps * W)] and variational distribution q(eps | x).

@eb8680

This comment has been minimized.

Copy link
Collaborator

eb8680 commented Nov 12, 2017

In this example, you mean literally doing inference over the joint p(eps, x) = N(eps | 0, 1) I[ x == tanh(eps * W)] and variational distribution q(eps | x)

Yep, although a hard constraint x == y on a continuous-valued distribution is false a.s. so in this case you could use a Normal with mean y and very small variance as your observation distribution. This is a very basic version of an ABC likelihood with Gaussian kernel implemented manually within a model.

@dustinvtran

This comment has been minimized.

Copy link
Collaborator

dustinvtran commented Nov 12, 2017

So I guess the takeaway re:universality is that the condition operator holds for continuous non-invertible programs, but the algorithm itself fails because the indicator would almost surely not hold. A Gaussian kernel could work for a naive algorithm in practice although it's not the same program.

@eb8680

This comment has been minimized.

Copy link
Collaborator

eb8680 commented Nov 12, 2017

the algorithm itself fails because the indicator would almost surely not hold

Yeah, see "Running Probabilistic Programs Backwards" for an interesting discussion of the problem of conditioning on rare or complex events in a PPL.

@riedelcastro

This comment has been minimized.

Copy link

riedelcastro commented Nov 12, 2017

I tried this

def add_one_or_two(guess):
    init = Variable(torch.Tensor([2]))
    choice = pyro.sample("choice", dist.categorical, ps=guess,vs=[False,True])
    if choice:
        outcome = init + 1
    else:
        outcome = init + 2
    return pyro.sample("outcome",dist.delta, outcome)

guess = Variable(torch.Tensor([0.5,0.5]))
conditioned = pyro.condition(
    add_one_or_two, data={"outcome": Variable(torch.Tensor([4]))})
marginal = pyro.infer.Marginal(
    pyro.infer.Importance(conditioned, num_samples=100), sites=["choice"])

marginal(guess)

but this gives me

RuntimeError: invalid argument 2: invalid multinomial distribution (sum of probabilities <= 0) at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorRandom.c:230

Any ideas?

@fritzo

This comment has been minimized.

Copy link
Member

fritzo commented Nov 14, 2017

It seems to work to follow @eb8680 's suggestion and add a little noise to the output:

def add_one_or_two(guess):
    init = Variable(torch.Tensor([2]))
    choice = pyro.sample("choice", dist.categorical, ps=guess,vs=[False,True])
    if choice:
        outcome = init + 1
    else:
        outcome = init + 2
    return pyro.sample("outcome", dist.normal, outcome, 0.1 * ng_ones(1))

guess = Variable(torch.Tensor([0.5,0.5]))
conditioned = pyro.condition(add_one_or_two, data={"outcome": Variable(torch.Tensor([4]))})
marginal = pyro.infer.Marginal(pyro.infer.Importance(conditioned, num_samples=100), sites=["choice"])

marginal(guess)
{'choice': array([False], dtype=bool)}
@ngoodman

This comment has been minimized.

Copy link
Collaborator

ngoodman commented Nov 17, 2017

btw extending the current implemented algorithms to include one that can deal with implicit models (ie observing a stochastically computed value from a function that doesn't come with a scoring function) is one of our next todos. basically, you can use a discriminator as an estimator for the likelihood ratio needed in the elbo. (@dustinvtran and @karalets have both done things like this in papers!) imho this is the right way to extend pyro (as an optimization-focussed ppl) to have something like the condition operator of Church etal.

@eb8680 eb8680 closed this Dec 18, 2017

@N-McA

This comment has been minimized.

Copy link

N-McA commented Dec 23, 2017

The paper mentioned by @ngoodman is "Hierarchical Implicit Models and
Likelihood-Free Variational Inference" (I believe), available on ArXiv

@innuo

This comment has been minimized.

Copy link

innuo commented Feb 13, 2018

Even if we can model them with delta or normal with small variance samples, observations that are deterministic transformations of latent variables have another consequence when performing variational inference.

Assume the model, where we are interested in learning something about mu given observation x = x0. (Assume a is known and f is a deterministic function.)

z ~ Normal(mu, a)
x ~ Normal(f(z1), sigma=epsilon, obs=x0)

In the guide, the latents can be modeled as being sampled based on the parameters or as small deviations of the observed values.

Either the guide looks something like

mu = param
z ~ Normal(mu, a)

or
z ~ Normal(approx_f_inverse(x0), sigma=epsilon)

In the first case, because x0 has negligible probability given guide-sampled z, the convergence is very slow.

The second case doesn't allow learning about mu at all because the sample z doesn't depend on mu.

How can we handle a situation as the one above? Does the likelihood free stuff mentioned above handle this?

Any pointers on how inference in the above model can be accomplished with Pyro currently?

@erlebach

This comment has been minimized.

Copy link

erlebach commented Dec 29, 2018

I had a similar problem, with a variable

 x = pyro.sample("x",  dist.Normal(0., 1.))
 w = 3.
 y = pyro.sample("y", dist.Delta(x*w), obs=x*w)

Notice the use of sample as opposed to sampling from a normal distribution of mean x*y and small variance. The difference between the two approaches would be in the guide (for a variational method, which I am using). Using the Delta function, there is no need for a corresponding guide sample. After all, y is observed. The guide must contain the corresponding sample variable if the Normal distribution is used.

A related question is whether a deterministic function of a sampled variable can be considered as being observed?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment