Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use dict to declare type #117

Closed
johnyf opened this issue Jan 30, 2015 · 14 comments
Closed

use dict to declare type #117

johnyf opened this issue Jan 30, 2015 · 14 comments
Assignees
Milestone

Comments

@johnyf
Copy link
Member

johnyf commented Jan 30, 2015

PEP 20: explicit is better than implicit. Inferring the type of a variable from the data type used to describe its domain is fragile. As of c8f8217:

  • the value boolean signifies that the variable is of Boolean type
  • a 2-tuple signifies that the variable is of integer type
  • a list signifies that the variable is of enumerated type, where its possible values are restricted to strings (syntactically defined enclosing them in double quotes).

This approach mixes the type definition with the data type that one will use to represent the domain itself.
The most common error that I keep repeating is to pass an iterable of length 2 to define the limits of an integer domain, but not use a tuple. For example, if this is happening outside tulip and attention is focused on some other detail, then it is quite easy to not use a tuple, or perhaps use a set instead of a list.
The use of types above is to be avoided, for the same reasons that isinstance should be avoided, when possible.

The action item is to use two separate fields in the dict of a variable:

  • a "type" key that takes str values, in particular in {"bool", "int", "str"}
    It may be tempting to admit multiple alternatives for each type, e.g., both "boolean" and "bool". By PEP 20: "There should be one-- and preferably only one --obvious way to do it." so this temptation is resisted here. The above type names have been borrowed from the implementation language (Python).
  • a "dom" key that takes values:
    • for bool the value is ignored
    • for int the value is an iterable of length 2 that contains int
    • for str the value is an iterable of str

Note that this issue does not intend to suggest that a fix is necessary, perhaps rewriting from scratch some parts based on the experience gained so far will yield much better results.

@johnyf
Copy link
Member Author

johnyf commented Jan 30, 2015

For later reference: this time the above error was caused by dumping to json a table of symbol definitions, represented as a dict, then loading it back, which resulted in the tuples that represented the ranges of integer variables being silently converted to lists.

@slivingston
Copy link
Member

I think that your proposed action item is a nice long-term solution insofar as we hope to introduce more types. However, the difficulty that you have encountered---of failing to use a tuple, e.g., due to silent conversions---can be addressed by the following alternative actions:

  • a variable is of integer type if its domain is an iterable with length two and has both elements of type int.
  • otherwise, if the domain is some finite-length iterable, then the variable is of enumerated type.

Note that these actions are similar to the values of the "dom" key that you propose. I recognize that this does not address your first invocation of PEP 20. However, I think introducing a dict for every variable is excessive given how simple and easily distinguishable the currently supported types are.

@slivingston slivingston added this to the 1.2.0 milestone Mar 26, 2015
@johnyf
Copy link
Member Author

johnyf commented Mar 26, 2015

I agree that simplicity is generally to be preferred. However, I already use an extra dict attribute (as a temporary solution) in my own work, because:

  • there is no other place to store additional information like an initial condition (assignment to single variable), a more refined type distinction for integers (saturating vs modwrap), or extra attributes like bitnames, when working on the specification object.
  • typically, compilers have a symbol table, i.e., a dictionary that maps each variable symbol to its attributes. The presence of a dict allows flexibility: one can store only "type" and "dom", someone else can store more attributes, as necessary for their application.
  • the implicit convention causes many problems. Personally, I have spent a lot of time debugging issues and having difficulties because of this convention. It is much more fragile than a dict, and errors can propagate farther easier.
  • related to the previous item, not using a symbol table complicates all the relevant code. This touches on another issue, that of defining the partition of variables into env_vars and sys_vars. A symbol table is more homogeneous: each variable looks the same, and its "owner" key has different value, "env" or "sys". Code can be simplified, because any operation on all variables simply iterates over the symbol table (e.g., an attribute GRSpec.vars). Moreover, a common pattern throughout the code is to repeatedly collect all the variables into a single dict, and then perform some task that doesn't relate to the partition of variables. Also, the use of a single table extends w/o changes to multi-player logics, whereas creating more attributes doesn't address that need very elegantly.

Having said the above, I admit that for human input, initializing a symbol table as a dict-of-dict is cumbersome and I have avoided that myself too. Instead, one can pass a dict of variables to a method add_var. The method can have a convention similar to the one currently used by env_vars and sys_vars. So anyone who wants to specify more can add information directly to the symbol table, either manually or programmatically, and the relevant code gets more robust and somewhat simplified.

The use of a single vars table, together with an set called env_vars has been experimented with in a redesigned transys. Note that that solution is slightly different. I have not yet considered in detail the comparison with the above.

@slivingston
Copy link
Member

Before beginning, note that the proposed action items in the OP involve changes that break a lot of existing code or otherwise require wrappers for backwards compatibility. A general strategy I recommend is

  1. identify legitimate concerns with the existing implementation, and
  2. derive from existing classes if you need new behavior, so that existing interfaces that are not defective remain intact.

Addressing your specific comment, let us be concrete. I am referring specifically to variable representations in the classes LTL and GRSpec. In your reply above you introduced new action items:

  1. consolidate env_vars and sys_vars into a single vars, and introduce an "owner" key to indicate the owner.
  2. facilitate other annotations, like whether an integer-valued variable saturates or modwraps.

I am strongly opposed to the first item. There is much code that relies on there being env_vars and sys_vars. Changing it will require many internal changes to TuLiP and will break other uses of TuLiP, and I do not think internal simplifications are sufficiently impressive to justify the effort imposed on users. Regarding the second item, can you not meet your needs by creating a new class, e.g., LTLAnnotated that derives from LTL?

You condemn "the implicit convention" broadly, but can you give specific motivation for this particular issue (referring back to the OP)?

Finally, I want to emphasize that introducing symbol tables into the LTL class is interesting in the long term. However, I do not think it is of sufficiently broad interest yet to justify breaking compatibility, especially if you can derive a new class for your own work.

@slivingston slivingston removed this from the 1.2.0 milestone Mar 26, 2015
@slivingston
Copy link
Member

I just removed the 1.2.0 milestone to not arbitrarily restrict the timescale on which this issue is addressed, especially if we decide to use a dict or introduce symbol tables, etc.

@murrayrm
Copy link
Contributor

Can someone give an example of how user code would be modified under the proposed change? A before/after example would be good, perhaps one of the standard examples that we use?

@slivingston slivingston added this to the 1.3.0 milestone Jul 31, 2015
@slivingston
Copy link
Member

I think that the OP is well argued and that the action items should be performed for version 1.3.0. I recommend that this change is not made for version 1.2.0 for the same reasons that motivate a period of feature-freeze (i.e., no additional features; only bug-fixes) before release.

After the OP, discussion turned to other motivations and possible uses of symbol tables. While I think these could be valuable, I want to keep this issue focused on the action items of the OP. After such a dict is introduced, we can re-visit discussion about additions to it.

@murrayrm: I skimmed the examples but did not find one that would be convenient for demonstration. While there are examples involving variables of all three types, the user-facing code does not reveal them. E.g., the int type is obtained automatically when translating a FiniteTransitionSystem to an LTL formula, and the names of modes of a switched system are translated to values of a str type, i.e., arbitrary finite domain (referred to as enumerated type in the OP).

For discussion, here is a small example of how we currently may create a specification involving all three types. Note that a common alternative style is to create the dict sys_vars and the list sys_safety first and then instantiate GRSpec with them.

from tulip.spec import GRSpec
f = GRSpec()
f.sys_vars = {'v1': 'boolean', 'v2': (0,3), 'v3': ['tofu', 'beef', -1, 2]}
f.sys_init = ['v1', 'v2 = 0', 'v3 = "beef"']
f.sys_safety = ['v1 -> X(v3 = "tofu")']

Following the comment by @johnyf above, we may change the line above in which f.sys_vars is modified to be

f.add_var('v1')
f.add_var('v2', (0,3))
f.add_var('v3', 'str', ['tofu', 'beef', -1, 2])

where add_var() is a new method of the class GRSpec that provides convenient creation of variables for humans. In the third line, we use an optional declaration of the type to remove ambiguity (and disable type inference). Notice that I did not declare these as "system" variables. Instead, I am proposing a design where 'sys' is default but the owner could be changed, e.g.,

f.add_var('v2', (0,3), owner='env')

@johnyf
Copy link
Member Author

johnyf commented Mar 27, 2016

After implementing these approaches in the package omega, I have the following observations to report.

My proposed style to define a full symbol table is cumbersome for external users.
It has been very useful for making the implementation simpler and more readable, and for developer usage.

The availability of a dict as symbol table has enabled forwarding first-order information through the bitblaster for use in picking a variable order, even though this use case was not planned for.

It also works for multiplayer games, whereas the env-sys classification would need to be modified (i.e., tracking an "owner" is necessary for representing games with multiple players, irrespective of whether detailed type information is provided).

To avoid confusion below: there are roughly 4 possible symbol tables:

  1. as currently in GRSpec

    a = tulip.spec.form.GRSpec()
    a.env_vars = dict(x='boolean', y=(0, 3))
    a.sys_vars = dict(z=(2, 10), w=['tofu', 'beef', 'foo'])
  2. a simple, human-friendly approach similar to 1, implemented in omega.automata.TransitionSystem and converted to 3 by omega.symbolic.logicizer._vars_to_symbol_table.

    g = omega.automata.TransitionSystem()
    g.vars = dict(x='bool', y=(0, 3), z=(2, 10), w=['tofu', 'beef', 'foo'])
    g.env_vars = set(['x', 'y'])

    A small example is here.

    Note that this is for a transition system, which is then converted to a omega.symbolic.Automaton. If initializing the latter directly (usually I combine these two approaches, or use a structured input language), then the symbol table is written in form 3 directly (or the symbol table output from graph_to_logic augmented).

    Also, notice that this doesn't work very well for multi-player games, without any glue code or manual monkey patching. As soon as we need to define an arbitrary owner per variable, it becomes convenient to use a separate dict for each variable (we can also define a dict per player, but I find the former simpler and more homogeneous for iterations and search).

  3. detailed homogeneous symbol table (compiler-like) that a human can still write. This is a dict-of-dict where we define type, domain, owner, and -optionally- initial value. Explicit domain definition allows distinguishing between modwrap and saturating integer semantics (integers in tulip have saturating semantics).

    g = omega.symbolic.symbolic.Automaton()
    g.vars = dict(
        x=dict(type='bool', owner='env'),
        y=dict(type='int', dom=(0, 3), owner='env'),
        z=dict(type='saturating', dom=(2, 10), owner='sys'))
    # `'w'` not supported in `omega` (Booleans and integers), but would be similar
  4. bitblasted symbol table: 3 augmented with mapping to bitfields, width, and signdness. This is used by the machine, so not to be discussed here.

The formats 2, 3, 4 are documented in omega/doc/doc.md.

As commented here, there is also the convenience function omega.logic.bitvector.make_table that converts a simple table (2) to a detailed table (3). An example is in the tests here.

I think that each interface has its purpose, because use cases at different levels of involvement (by the same person) with the backend admit different input as more convenient. I have kept all of 2, 3, 4, though I use mostly 3. It seems that more detailed input by humans is better accommodated by using a specification language (Promela, etc.) as input. Whether that should be typed or not is a different discussion.

Footnote: omega supports Boolean and integer variables. I am skeptical whether string or mixed variables are very useful at the level of algorithm development. I think that they should be supported by the input language, but be mapped to integers by the compiler's frontend. I regard tulip primarily as an integration of interfaces to algorithm implementations, so such a conversion need not be at the core, but can live in the frontends.

@johnyf johnyf self-assigned this Jul 31, 2016
@johnyf johnyf modified the milestones: 1.4.0, 1.3.0 Aug 15, 2016
@johnyf
Copy link
Member Author

johnyf commented Nov 26, 2016

I think that the action item to close this issue should be to add a method GRSpec.add_var, as noted in passing above, and detailed above.

A method abstracts away from the internal implementation, allowing for flexibility. it does not require that the user know and manipulate the data structure directly (attributes of GRSpec in this case). This particular approach proved very versatile in dd, especially to achieve uniformity across modules that implement interfaces to quite different BDD implementations (Python, CUDD, Sylvan, BuDDy), each of which represents and manages variables very differently.

A more recent experiment of mine is omega.symbolic.fol.Context, which is a first-order convenience layer on top of BDDs. There, I created a method add_vars (dd has featured add_var until now, but not any add_vars, because I haven't yet been convinced to extend or change the API. networkx generally favors offering both singular and plural variants for several methods in its API).

A method as interface has the added advantage that it can execute code with arguments (as opposed to an attribute---even if it is @property). For Context, this allows checking for existing variables when new variables are added, and raising errors if an integer variable name has already been defined, but with a different refinement (by which I mean a different representation as bits at the lower level---sometimes considered a "type"). Clearly, this is everyday compiler functionality, but mentioned as an added benefit of using a method.

The existing API of directly changing attributes will remain unaffected. In the future, if a good reason is found, it could be modified. My expectation is that all input will shift to a suitable input language with proper variable declarations, thus offering an earlier entry point to users.

@murrayrm
Copy link
Contributor

From a user perspective, I like something like this:

f.add_var('v1')
f.add_var('v2', (0,3))
f.add_var('v3', 'str', ['tofu', 'beef', -1, 2])

much better than this

g.vars = dict(x='bool', y=(0, 3), z=(2, 10), w=['tofu', 'beef', 'foo'])

Of course, these are not incompatible (the add_var() method could be something that has reasonable defaults and interpretations to make things look clean.

@johnyf
Copy link
Member Author

johnyf commented Nov 27, 2016

I agree, in dd it works like this:

b = dd.bdd.BDD()
b.add_var('a')
b.add_var('b', level=5)

By default, new bits are added at the next level that is not occupied. Passing a level explicitly overrides this behavior.

The method omega.symbolic.fol.Context.add_vars takes a dict (it needs to take some sort of container), so for the methods with plural names it will probably still be a dict. Nonetheless, this suits both beginners and experts / code: beginners use add_var, one at a time, and experts can pass a dict of dict (in absence of values defaults can be used: for example, the player that owns a variable), similarly for code.

My main motivation when implementing add_vars was to implement some checks (name conflicts with existing variables etc.), so I didn't try to simplify the interface at that time.

@johnyf
Copy link
Member Author

johnyf commented Mar 3, 2017

More arguments in favor of not manually defining per variable owners, to add to the arguments above:

  • It is repetitive and tedious to define an owner per variable. With 10 variables controlled by a player, we need to write 10 times the name of that player. Not at all practical.
  • More readable to group the variables of each player in a list, one per player. Otherwise one has to look through the entire symbol table and mentally collect a player's variables. Writing them as a list saves this effort.
  • Variables are symbols used as names of nullary operators in a formal system. They aren't owned by any player. They represent players via quantification and the way we use them in formulas. Moreover, a variable may be a parameter or represent one subsystem at one level and a component of that subsystem at another level. So, we shouldn't impose on the user to associate an "owner" with each variable. Groups of variables are additional information, and should be allowed to even overlap. This is more conveniently done with a dict-of-list-of variable names. It generalizes sys_vars and env_vars from tulip.spec.form.GRSpec attributes to keys in this dict.

Exotic semantics (saturating "integer variables", modwrap) suggest types. They are implicit and confusing. I never use them. I removed the silent automated generation of type constraints from omega, requiring the user to write the type constraints explicitly in logic. The ranges of integers in the symbol table are simply type hints to help the tool choose how many bits to use for refining an integer. In other words, a hint of what the user thinks should suffice to process the intended specification. The specification itself should all be at one place: in a text string (e.g., a TLA+ module).

Of course the tools can be annotating variables in symbol tables however is needed by the code. But the user interface need not reflect this activity.

@slivingston slivingston modified the milestones: 1.4.1, 1.4.0 Jul 22, 2017
@slivingston
Copy link
Member

@johnyf any updates on this?

I am available to review any work you may have on this. Alternatively, I can try to implement the proposed add_var function.

@johnyf
Copy link
Member Author

johnyf commented Jul 23, 2017

I added a method GRSpec.declare on branch declare_vars. Although complete, I would like to examine the implementation a little more before opening a PR.

Boolean-valued variables are given as positional arguments (the absence of a type hint signifies that a variable is Boolean-valued). Other variables are given as keyword arguments, where the value is either a pair of integers, or a sequence of strings. For example:

f.declare(
    'v1',
    v2=(0, 3),
    v3=['tofu', 'beef', '-1', '2'])

Passing env=True signifies environment variables (almost as suggested above). For example:

f.declare('v4', env=True)

Giving type hints as keyword values, instead of separate arguments is necessary if declare is to accept multiple variables that aren't passed within a dict argument (doing so would defy the motivation for introducing declare, due to the nesting). With two players:

  • a bool option suffices (compared to owner='env')
  • It suffices to call declare twice.

Reserving the keyword "env" precludes defining a variable named "env" via a call to declare. I do not recall ever using "env" as a variable name. In any case, this is a small limitation, circumvented by directly modifying the attributes GRSpec.env_vars or GRSpec.sys_vars (which as of f177062 equal the superclass's LTL.input_variables and LTL.output_variables, respectively).

Redeclarations are allowed, but must match existing type hints and owners, thus have no effect (the docstring of method declare describes more details). Also, multiple calls are an option, for example:

f.declare('v1')
f.declare(v2=(0, 3))
f.declare(v3=['tofu', 'beef', '-1', '2'])

Regarding earlier comments from above

Until now I have experimented with various approaches in omega-based code. My tentative conclusion is:

  • One symbol table vars with all type hints for both flexible and rigid variables. This is especially useful for rigid variables, because they may represent no component.

  • Separate lists of variable names (only--no type hints) that define components, in an attribute varlist; a dict that maps each component to a list of variables (ditto).

This approach is flexible and not repetitive. Remarks from usage experience:

  • writing owner=... for each variable in a detailed symbol table is too repetitive (ditto)

  • a method that declares one variable imposes multiple calls

    t = Something()
    t.add_var('a')
    t.add_var('b')
    t.add_var('c')

    or the slightly less repetitive, but not as readable:

    t = Something()
    [t.add_var(var) for var in ['a', 'b', 'c']]

    I used this approach for two years in dd, until I was convinced to add a method to declare multiple variables at once. For this reason, declare accepts multiple variables.

  • naming: "declare" appears to be a better choice than add_var. Arguments: consider renaming add_var to declare dd#25

  • flexible or rigid variables? Not so trivial. Practice shows that the TLA+-inspired method names declare_variables and declare_constants are good choices (and in the spirit of PEP 20). Separate methods are useful because, given unprimed identifiers, the method declare_variables declares both unprimed and primed identifiers, whereas declare_constants only unprimed identifiers. Interestingly, it is possible to pass primed identifiers as keyword arguments, but indirectly:

    def f(**kw):
        print(kw)
    
    d = {"a'": (0, 10)}
    f(**d)

    Nevertheless, declare_variables is simpler to use and more readable.

    tulip currently has a notion of flexible variables, but not constants yet (after the API of omega stabilizes it would be worth reconsidering how rigid variables are to be handled). So declare suffices for now (as done in dd for BDD variables).

  • defining components using a dict of lists of variables is very flexible. Multiple components can be defined. Components can be changed w/o modifying the type hints (otherwise one would have to transfer both variable names and type hints between attributes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants