Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou #26

pchiusano · 2015-07-29T22:15:51Z

This was a pretty tedious but straightforward change. Rather than hardcoding Symbol as the variable type used by ABT, it is now a parameter, bounded in most places with the typeclass Var (in Unison.Var). So we have Term v with variables in v, Type v with variables in v, etc.

With v being the old Symbol type, we have the old behavior. But, v could add mixfix operators, or the generalized Doc-based layout I am going to be implementing next. This also makes it easy to fiddle with different precedence schemes if we want. The only code that is aware of the specific v will be the absolute top-level, which fixes v to some concrete type.

Something I considered but rejected was storing richer symbol layout outside of the syntax tree. This is really error prone - variables are getting moved around really often when editing, and keeping this richer information in sync is a nightmare. Better to store it in the tree directly, and just use parametricity to avoid coupling yourself to the details of that extra info.

Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou

Ericson2314 · 2015-07-30T00:52:41Z

For that compiler I wrote, I did this separately for bindings and references. Unless I am missing something, It seems that much of the information that this is intended to keep track of would be stored redundantly with each reference. For the same reasons it is better to store information in the AST than a "side table", wouldn't it be better to store this information just once at the binding site?

pchiusano · 2015-07-30T03:22:54Z

You mean storing the auxiliary information where the variable is bound? My
concern with that was it seemed easy for variables to get out of sync with
their auxiliary information. But maybe if this is baked into ABT the right
way it could work well...
On Wed, Jul 29, 2015 at 8:52 PM John Ericson notifications@github.com
wrote:

For that compiler I wrote, I did this separately for bindings and
references. Unless I am missing something, It seems that much of the
information that this is intended to keep track of would be stored
redundantly with each reference. For the same reasons it is better to store
information in the AST than a "side table", wouldn't it be better to store
this information just once at the binding site?

—
Reply to this email directly or view it on GitHub
#26 (comment).

pchiusano · 2015-07-30T03:24:58Z

Regarding the redundancy, my thought was that it's easy to share the values
in memory, and the wire format for terms can also preserve this sharing.
But it might be cleaner to just make the representation share the auxiliary
info more directly.
On Wed, Jul 29, 2015 at 11:22 PM Paul Chiusano paul.chiusano@gmail.com
wrote:

You mean storing the auxiliary information where the variable is bound? My
concern with that was it seemed easy for variables to get out of sync with
their auxiliary information. But maybe if this is baked into ABT the right
way it could work well...
On Wed, Jul 29, 2015 at 8:52 PM John Ericson notifications@github.com
wrote:

For that compiler I wrote, I did this separately for bindings and
references. Unless I am missing something, It seems that much of the
information that this is intended to keep track of would be stored
redundantly with each reference. For the same reasons it is better to store
information in the AST than a "side table", wouldn't it be better to store
this information just once at the binding site?

—
Reply to this email directly or view it on GitHub
#26 (comment).

Ericson2314 · 2015-07-30T04:07:17Z

Yeah the ABT will help. Yeah memory sharing wouldn't help unless one had some sort of memoization scheme right?

pchiusano · 2015-07-30T14:09:12Z

So, I played with this a bit. You could do the following:

data Var = Var { freshId :: Int, name :: Text }

data ABT f v r
  = Var Var
  | Cycle r
  | Abs Var v r
  | Tm (f r) deriving (Functor, Foldable, Traversable)

data Term f v a = Term { freeVars :: Set v, annotation :: a, out :: ABT f v (Term f v a) }

So at the point where variables are introduced (the Abs constructor), you supply some auxiliary info of type v. This might have fixity and precedence info, or other info about how to display or render the corresponding Var references.

What I don't like about this is that now every function that inspects or manipulates a term needs to keep a local stack of this auxiliary info, to maintain that mapping between Var references and the corresponding v. You can no longer just pass a function a Term v, and it has everything it needs. This is error prone and more boilerplate.

I'm sure you could address this by writing a zipper type for ABT terms. The zipper keeps track of the association between variable references and their aux info. But now you have to use zipper traversal and extraction rather than just pattern matching, and it's more complicated for very little benefit that I can see.

Regarding the redundancy, it's not an issue, and we don't need any fancy memoization either.

When terms are created, you'll be able to specify the aux info for any variables introduced. Thus, we get the sharing for free there - you'll create that info in one place and use literally the same Haskell value in multiple places in the syntax tree.
When we resolve a hash to a symbol, we also get sharing for free there. The editor maintains a mapping from hash to symbol. Unless it repeats the lookup more than once for a symbol (which would be inefficient, since each lookup requires a server round trip), it will just share the same symbol reference everywhere in the tree.
When we serialize an ABT term, we can do so in a way that preserves this sharing info.

Regarding this last point, we can easily write the pure functions:

-- move all symbol annotations to where they are introduced
share :: Ord v 
      => Term f (Symbol v) a
      -> Term f (Symbol ()) (a, Maybe v)

-- push symbol annotations down into the corresponding variable references
unshare :: Ord v 
        => Term f (Symbol ()) (a, Maybe v) 
        -> Maybe (Term f (Symbol v) a)

(We probably need some extra info about f to implement, but you get the idea.)

What we actually serialize is the result of share, and we then unshare on the other side. Thus the sharing information is preserved. We can also just handle this translation directly in the serialization code if we want.

And this is also all only really relevant if that auxiliary info is really large... which is not an issue yet.

Anyway, that is my current thinking.

Ericson2314 · 2015-07-31T04:01:20Z

What I was thinking was something like:

data ABT f binding reference r
  = Var reference
  | Cycle r
  | Abs binding r
  | Tm (f r) deriving (Functor, Foldable, Traversable)

data Term f f2 b v a = Term { freeVars :: f2 v, annotation :: a, out :: ABT f b v (Term f v a) }

For example, de Bruijn indices is Term f Max () Word a, where Max is some singleton container that will just hold the max of anything you insert.

So the bookeeping or lack thereof is wholly dependent on the parameters. We can still write share and unshare too -- indeed doing so may be easier than keeping an environment in some cases.

and we don't need any fancy memoization either.

Don't get me wrong, the hashing is elegant, but avoiding communicating with the server when caching with the hash as a key is a memoization scheme of sorts. My concern is on the server---or wherever the hashes are recursively substituted with their subtrees---that it is easy to just pattern match on the same shared memory and then replace it with something different when doing a functional update---without reflecting on a sort of reference (be it hash or adress) there is nothing (selectively) enforcing that the sharing is preserved in future functional updates.

pchiusano · 2015-07-31T13:47:30Z

Okay, I see what you are going for there. My feeling is that it's overkill - I don't really have a need to abstract along that dimension. (Like I don't care to abstract over whether de bruijn indices or ABTs are used). And it makes the type more complicated - you will certainly be forced to define some typeclasses to write code which is generic in f2, b, v. I'm sure you could make it work... but it doesn't feel like it pays for itself. I realize this is somewhat subjective. :)

Don't get me wrong, the hashing is elegant, but avoiding communicating with the server when caching with the hash as a key is a memoization scheme of sorts.

True. I guess my point is that we are already doing that anyway (for other efficiency reasons), so there's no additional complexity to maintaining that sharing there.

My concern is on the server---or wherever the hashes are recursively substituted with their subtrees---that it is easy to just pattern match on the same shared memory and then replace it with something different when doing a functional update

Ah. I don't think this will be an issue, because all code other than the term / type rendering logic is or will be parametric in the choice of variable type. Just like a function forall a . [a] -> [a] cannot destroy sharing info at the level of a values because it has no ability to pick apart a values. It can certainly destroy sharing of the list spine, but that is all. By doing this sort of reasoning I have convinced myself that this won't be a problem. We can keep this in mind and revisit if it does become an issue in the future.

By the way, even though I am rejecting your suggestions :) I appreciate that you are making them.

If you would like to work on something that would definitely be accepted, check out #25 and #24. :) I also have another project I still need to write up which involves implementing some succinct data structures. I think it will be really interesting also.

pchiusano added 5 commits July 28, 2015 23:12

WIP making ABT trees polymorphic in their variable type

23f6ab4

shared project compiling with new representation of terms

5e7b7ab

Got rid of superfluous Show constraints

5f12293

everything compiling, tests failing

47b8d6a

Fixed ordering on symbols, all code compiles and tests now pass

a769a58

pchiusano added a commit that referenced this pull request Jul 29, 2015

Merge pull request #26 from unisonweb/topic/annotated-symbols

7691613

Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou

pchiusano merged commit 7691613 into master Jul 29, 2015

pchiusano deleted the topic/annotated-symbols branch June 21, 2016 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou #26

Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou #26

pchiusano commented Jul 29, 2015

Ericson2314 commented Jul 30, 2015

pchiusano commented Jul 30, 2015

pchiusano commented Jul 30, 2015

Ericson2314 commented Jul 30, 2015

pchiusano commented Jul 30, 2015

Ericson2314 commented Jul 31, 2015

pchiusano commented Jul 31, 2015

Make Term and Type agnostic to the type of variables, in prep for supporting rich function layou #26

Make Term and Type agnostic to the type of variables, in prep for supporting rich function layou #26

Conversation

pchiusano commented Jul 29, 2015

Ericson2314 commented Jul 30, 2015

pchiusano commented Jul 30, 2015

pchiusano commented Jul 30, 2015

Ericson2314 commented Jul 30, 2015

pchiusano commented Jul 30, 2015

Ericson2314 commented Jul 31, 2015

pchiusano commented Jul 31, 2015

Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou #26

Make `Term` and `Type` agnostic to the type of variables, in prep for supporting rich function layou #26