New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative design of variable scoping and naming #502
Comments
We're definitely planning to add a dynamic scoping mechanism of some sort for both |
Sorry, but I don't really get this. The plan seems to be to copy TensorFlow's heavyweight design. PyTorch is so nice because it piggybacks on python's natural scoping rules. Clarification: It would seems natural to me that params would behave like any other object in PyTorch (variables, params, etc.). You don't access it by a global string in some global state dictionary, but by actually having access to the variable. That way you don't break the standard coding abstractions in python. This also allows you to easily introduce anonymous terms and parameters and avoids ugly string manipulation for lookups. |
For models that can be written as Unfortunately, parameters in Pyro don't always work this way; consider e.g. per-datapoint variational parameters that are created on the fly each time a new data point appears, or parameters whose existence depends on a random variable.
|
Not sure I understand. Why should you be able to sample a random variable that you don't have pointer access to? Needing to do that sounds dangerous. Why does having non-global scoping prevent per-data point parameters? Presumably whatever object is creating the parameters should own them, not some global store. How come these issues don't come up in pytorch? In theory you have to do many of the same things in standard deep learning, and yet there are no globals. |
(Sorry, this library is great, and I was thinking of using it for teaching. But I am really worried about this design decision.) |
No problem! The parameter store is definitely a weak point, and this is great feedback.
I guess we could have a per-model parameter store instead, though, sort of like a dynamically generated |
I guess I get that, but now I'm much more worried about this type of name matching though. This feels very scary: ""To be very explicit, if the model contains a random variable z_1 def model(): def guide(): If model was a class with member z_1 = sample. Then this could be done correctly instead of asking the user to manage this alignment. |
To compute the Monte Carlo Elbo
In Pyro, we made the choice to completely separate the definitions of model and guide and allow both of them to be callables with arbitrarily different structure and control flow. The user pays for this flexibility by having to manage things themselves, but could implement the safer functionality you describe on top of the more general behavior we provide. |
Okay, sure. I get it, I just disagree. I would love to have a PyTorch style mode that looks like this:
This seems clean to me, has no globals, and could even allow you to reuse guide functions that with differently named samples. |
This is actually the design in Edward (e.g., String-matching is a sensible solution for Pyro. A similar manual alignment could, for example, assume identical class members in |
Could one of you give me an actual example that can't be handled with real variable scoping? Just asserting that this is reasonable is not convincing. It seems to me like PL 101 that a programming language should avoid having the user reinvent scoping using all global string lookups. You say that this is a problem with dynamic functions. But PyTorch is a dynamic graph library that never requires the user to do this kind of black magic. |
An example of what we're discussing is |
Manual alignment of different names in model and guide is a separate issue from scoping. It's actually possible now to do manual alignment in Pyro; that functionality is already implemented in I guess I don't really understand what you're proposing. Scoping for samples is about names, not access; sample statement inputs and outputs are lexically scoped just like function applications. Consider the following rather contrived example: def chain(x, t):
if t == 0:
return pyro.sample("{}".format(t), dist.normal, torch.zeros(1), torch.ones(1))
else:
return chain(pyro.sample("{}".format(t), dist.normal, x, torch.ones(1)), t-1)
def model(x, t):
mu = chain(x, t)
return pyro.sample("y", dist.normal, mu, torch.ones(1))
def mean_field_guide(x, t):
latents = []
for i in range(t):
latents.append(pyro.sample("{}".format(t), dist.normal, pyro.param("m_{}".format(t)), torch.ones(1)))
return latents How should the sample names be scoped automatically without changing the model code? How should an inference algorithm align the guide samples with the model samples? |
@dustinvtran Not sure I understand. I'm arguing that global scoping is not necessary. @eb8680 I see these issues as very connected. The fact that string alignment is done this way is a warning sign symptom of a programming language that is using a problematic internal scoping model. I made my proposal above. If you want to expose something to an inference algorithm, you should have to have a reference to it, not a string in a global namespace. Innumerable programming languages thought it was a good idea to have global string references (early php, javascript, etc.) but that always turns out to be a bad idea, and we think of them bad languages to work with. It seems early enough that pyro can avoid this issue. |
@srush I updated my example to make my point clearer. I really do appreciate your feedback; it's very helpful for thinking through these issues.
The sample namespace isn't global, it's local to the model. It seems to me that your proposal just moves the awkwardness from naming into alignment, since that can no longer be done automatically when the model and guide have different scope structures. |
It seems effectively like a global to me. You access it by constructing its global string name, not by having a reference to it. Compare this to the way PyTorch params respect the natural scoping rules of python. You cannot magically access them, you need a reference. This is how I expect non-global variables to work. I think I understand your example. How about this? class Model:
def __init__(self):
self.mus = pyro.LatentList(dist.normal) # no string global name, can't be accessed without pointer
self.x = pyro.Latent(dist.normal)
def forward(self):
mu = self.mus.next_sample(torch.zeros(1), torch.ones(1))
for i in range(t):
mu = self.mus.next_sample(mu, torch.ones(1))
return self.x.sample(mu, torch.ones(1))
def Guide:
def __init__(self):
self.q = pyro.LatentList(dist.normal) # no string global name, can't be accessed without pointer
def forward(self, t):
return [self.q.next_sample(pyro.param()) for i in range(t)]
m = Model()
g = Guide()
SVI2(variational= [[m.mus, g.q]...]) Or alternatively you could have a |
we've talked about doing things of this sort ( https://gist.github.com/martinjankowiak/af061c7a00e86e2874a217a0853efb2c on the left is the model and on the right is (one possible factorization) of the guide. the way in which i need to sample things in the guide is quite different from the natural ordering in the model. constructing the guide and aligning it with the model using some generalization of we all agree that we need to think through how naming works more carefully, and this is certainly a good time to be doing it. i think it's definitely worth taking a look at various constructs like |
Thanks Martin. That's helpful to know. Let me reiterate my point once more though, because I feel it is being lost. My argument is not for a particular style of handling locals (I made up this LatentList thing on my phone), but it is against globals as a default option that is the "documented way to do it". If there are corner case models that really require string matching based alignment, then have a LatentDict object that is owned by that part of the model and can use whatever strings it wants. This gives the ability to "support all possible model/guide structures/alignments", sicne you can then do something similar in the Guide. This has all the functionality you want but does not requiring, what to me, feels like late 90's PHP query string construction. |
I think we all agree it would help to have more concrete examples 😃 To help explore syntax, I hacked together a pyro.anon module on an |
Yeah! This is perfect, exactly what I imagined. Maybe can be bit less verbose, but it checks all my boxes. Also it is heighrarchical, so you can easily nest groupings of latent variables without string building. (It's kind of a statemonad in Python). My one quibble is that it still matches the guide variables by name match, but otherwise way better. |
@fritzo that's pretty neat. It might make sense to implement a @srush we'll definitely consider including something like that as we iterate on naming and scoping, though I see it existing alongside with rather than replacing the current syntax, much like |
So just imagining a bit how you could make this even more friendly.
@pyro
def TimeSeries(model, obvs):
model.sigma = anon.param(Variable(torch.ones(1), requires_grad=True))
model.steps = []
for obv in obvs:
prev = models.step[-1] if model.steps else Variable(torch.zeros(1))
model.steps.append(TimeStep()(obv, models.steps[-1], model.sigma))
@pyro
def TimeStep(model, x, prev, sigma):
model.y = anon.sample(dist.normal, prev, sigma))
model.x = anon.observe(dist.normal, x, prev.y, sigma))
@pyro
def Guide(model, obvs):
model.mus = [anon.param(Variable(torch.zeros(1), requires_grad=True)) for _ in range(len(xs))]
model.sigmas = [anon.param(Variable(torch.ones(1), requires_grad=True)) for _ in range(len(xs))]
model.ys = [anon.sample(dist.normal, model.mus[i], model.sigmas[i]) for i in range(len(xs))]
def main(args):
model = Model()
guide = Guide()
optim = Adam({"lr": 0.01})
inference = SVI(model, guide, optim, loss="ELBO",
align=lambda m, g: return [([s.y for s in m.steps], g.ys) ])
data = Variable(torch.Tensor([0, 1, 1, 0, 1, 2]))
for step in range(args.num_epochs):
if step % 100 == 0:
loss = inference.step(data)
print('{}\t{:0.5g}'.format(step, loss))
print('Parameters:')
# Will automatically collect all the sub variables and
for name in model.get_all_variable_names():
print(name) |
haha @eb8680 I think you have to make a choice :) torch.nn is not a great analogy, as it is entirely built on top of torch.autograd. Building a nice well scoped library using anonymous names in a global store with scary functions like |
I agree. To reiterate my last comment:
|
Oh yes @dustinvtran I agree. I was mostly responding to a change to @fritzo 's sample code. My main mission here is to kill the global variable store. I see these string alignment sub-issues as symptoms of the global variable store. |
@srush I like your decorator idea. I've updated examples/anonymous.py to use an |
Nice! This is looking great, let me play around a bit and try some of the examples.
|
Oh wait, I see @fritzo . So the What's the argument for why that is necessary? Now that you have actual objects, why not just give them a random address?
Am I missing something? |
Hmm assignment from a return value doesn't generally work yet (since the returned latent must be attached to the top-level model before* assignment). But it does work to let subfunctions mutate parts of the latent: @anon.function # Decorator is only used for top-level models.
def model(latent, size):
latent.params.mu = Variable(torch.zeros(1))
latent.params.sigma = anon.param(Variable(torch.ones(1))
latent.samples = []
my_subfunction(latent.params, latent.samples, size)
def my_subfunction(params, samples, size):
for i in range(size):
samples.append(anon.sample(dist.normal, params.mu, params.sigma)) |
One crucial behavior you might be unaware of is that |
oh! I see, that makes a lot of sense. That also explains a lot of the decisions I didn't understand. |
yep, refer to the elbo implementation to see how the 'alignment' occurs under the hood. in |
Hmm, so given that this swap happens through some cps/poutine trick that is invisible to the user
That way the user can grok that (Or alternatively through a |
Here's a similar notation that works: examples/attribute.py. Feels very protobuffy. |
I like it. The object setting is a little too weird for me though :) Maybe go even more protobuffy?
I think dict could work out of the box as well.
And actually at this point you could remove the
|
@fritzo Okay, I went through and ported all the documentation to use anon. (I mostly did the intro and svi. a lot of the other documentation is not runnable. ) https://github.com/harvardnlp/pyro/tree/anonymous-sites/tutorial_ette/source Mostly it worked great. Here are the remaining issues I saw:
Similarly
I think the anon version should have these string names be not globals, but localized to the function (scale). Also we should be able to pass in arrays and reference nested modules "submodel.mu". Maybe for complicated data, there should be a
|
Okay, fixed most of these issues in my branch: https://github.com/harvardnlp/pyro/blob/anonymous-sites/pyro/anon.py https://github.com/harvardnlp/pyro/blob/anonymous-sites/tutorial_ette/source/intro_part_i.ipynb There is some more to do, but I think this already way cleaner, maintains scope, makes the semantics cleaner, supports real arrays and nesting, and uses less code. |
Trying to find a compromise that we could merge soon: #540 |
@srush thanks for your ongoing feedback. This issue keeps growing in scope, and it would be really helpful if you could open new issues about things like inadequate documentation for As I said, you've made some great points about the parameter store, whose design was largely borrowed from webPPL, and changes like the ones you've sketched seem like clear wins. We're also more than willing to include an However, let me try to give some more background on the way
There's not really any way around this, and
You've also identified the same fundamental problem with your model-guide alignment solution: in general, aligning execution traces or observations with an arbitrary Pyro program can be as hard as executing the program, since you need to refer to variables before they exist. As @fritzo said:
A corollary to this fact is that in Pyro's current incarnation, sample names don't really correspond conceptually to Python variable names and scopes, and there's no particular reason that they should unless the model and trace being aligned already share the same structure; think of them instead as tags or database queries attached to stochastic function applications (where the function and its return value could both be anonymous Python variables) that solve the problem of alignment by construction. This design choice is rooted in both practical considerations and historical context. A particularly common workflow in Pyro is to write a single model, then write multiple guides specific to that model with different structure and one or more Historically, previous universal PPLs like Church, webPPL, Venture, and Anglican handled alignment by automatically generating name strings based on call stack position (which could be done easily in Pyro with Python's advanced runtime introspection facilities, e.g. a memoized version of Building up names manually through Pyro programs instead of data and guides is one natural generalization of these previous approaches. It cleanly separates the specification of model, data, and inference, and provides a clear point of contact for more advanced techniques (like code transformations, static analysis, alignment specifications like your proposal that completely decouple model and guide names, or the current implementation of As @dustinvtran said, Edward's behavior is closer to what you have in mind. This is precisely because it is possible to refer to nodes in a TensorFlow graph before you know their values, which also limits Edward's expressivity. Indeed the alignment mechanism you added to your Ultimately, we only see Pyro as one hypothesis in the space of possible language designs. It may well turn out to be a bad hypothesis - for example, Edward has clearly demonstrated the utility of static graphs as a probabilistic modelling language, and maybe in practice the benefits outweigh relatively mild restrictions on expressivity. That said, our development process was driven by the interaction of our design principles, properties of previous PPLs, and our tests and anchor examples, and we found that what we have strikes a good balance. Larger changes to the syntax and internals of Pyro represent different hypotheses, and should generally be justified on the basis of similar evidence. |
Wait, sorry, I'm confused by this response. I think the last In particular, I went through the exercise of rewriting this file. I think it is reasonably clean looking, and handles the condition case you mention: https://github.com/harvardnlp/pyro/blob/anonymous-sites/tutorial_ette/source/intro_part_ii.ipynb As I have said above, I have no problem with string-lazy access. My problem is with default global variables as the recommended way you are advising users to use the library. I really like the dynamic design of PyRo, I agree with your design goals. I just feel that these things do not need to be handled globally but can be built up explicitly in the functions by mirroring the semantics of python. |
There's no failure there! I'm just trying to point out that |
I meant in the sense of "what flexibility is it giving up?". Different point implies there are negatives, but don't see how it is more constrained. So far we are able to do all the tutorial examples using it while removing global variables (which I think we can agree are negative). |
Curious why all the params are auto-magically global with a shared namespace. Seems like a very different design choice than the nice cleanly scoped pytorch variables.
The text was updated successfully, but these errors were encountered: