Merging recursive record: pondering the semantics #103

aspiwack · 2020-07-22T12:39:21Z

I've been thinking a bit about the problem of overriding à la Nixpkgs.

My thoughts is that it boils down to the semantics of merging recursive records. Or, a bit pedantically, merging codata.

Let me illustrate. Suppose we have a bunch of pkgs:

# nixpkgs.nix
rec {
  packageA = …;
  packageB = … packageA …;
}

I want to override packageA so that packageB sees the new version. The current way that we think about this is that packageA will have some default values, and to make an override, we will do something like

let nipxkgs = import ./nixpkgs.nix; in
merge nixpkgs {
  packageA = … some new stuff …;
}

This returns a new nixpkgs where packageA has been changed (or, rather, made more definite, as per the merging model). The question, then, is: what version of packageA does packageB see?

So far, we've been thinking of it as if recursive records were data: unfold the record as a tree, and merge recursively. It's a simple model, it makes sense. And packageB sees the original version of packageA. But the fact is that, by virtue of the field being lazy, recursive records are, really codata. And maybe, thinking of them this way allows us to define a merging scheme where packageB will see the overriden packageA. Pretty much like in #83 (comment) . I haven't thought this through, so I don't know if we can carry this thought to a happy conclusion. But that would certainly be an elegant explanation for overriding.

Related issues: #74, #83 .

The text was updated successfully, but these errors were encountered:

yannham · 2020-07-28T11:40:01Z

It is not really an answer to your corecursive interpretation, but for the record I wanted to write down here some thoughts on merging and overriding. I think there are two aspects to the problem of how merging and recursive records interact: the semantics and an efficient implementation, and that the second seems the challenging bit.

Semantics

IMHO the semantic follows quite straightforwardly from the requirements. We want that recursive fields, which depend on other fields of the same record, refer to the most recent "version":

let x = { a = 1; b = 1 + a; } in
let y = merge x { a = 2;} in
...

In this situation, x.b should be 2 and y.b be 3. That is, recursive references to a record must be late-bound. This is for example the semantics of inheritance in Jsonnet (see the the (object-index) rule in the spec which binds the self keyword at field access) and also matches the #83 (comment) you mentioned.

This would allow for a language supported way of doing overriding in Nickel, in contrast with the beginner-disconcerting mechanisms of Nixpkgs and NixOs modules.

(Efficient) Implementation

Take 1: dynamic scoping

For late binding to work, we should not bind the occurrence of a in b at declaration site, but rather either only bind it when the field is accessed or at least update it on merge. Let us implement this in a naive way: when rec.field is evaluated, all fields of rec are made available in the environment, such that recursive references in field are resolved with respect to the current version of rec. We would have:

let x = { a = 1; b = 1 + a; } in
let y = merge x { a = 2;} in
y.b // Nice, this is 3 !

But now, if we write this instead:

seq x.b y.b // Argh, this is 2

As soon as x.b is evaluated, the corresponding thunk is updated to a constant, and all subsequent merges - such as y - inherit this fixed value. This is not appealing to say the least: not only it does not correctly implement the desired semantics but the result of a pure expression would depend of which thunks have been evaluated before.

Take 2 : functions

The problem is that we have an implicit dependency in b to some external, not-yet-known value a: as functional zealots, we immediately see the solution, which is to make this dependency explicit. We encode internally the previous example as:

{ a = fun self => 1; b = fun self => 1 + self.a }

Field access becomes rec.field => rec.field rec. Merge acts as follow: merge { a = a1; } {a = a2; } = { a = fun self => merge (a1 self) (a2 self) }.

The semantics is now correct, but we have lost the benefits of laziness, because as expressions are functions, they are re-evaluated each time. Surely we want to only apply this transformation to fields that are actually recursive (assuming that this can be determined syntactically, see the dedicated paragraph in the last section below), and avoid transforming the a of the original example to a function. But the problem persists for recursive fields such as b: in y.b + y.b, the expression for y.b is still evaluated two times.

Take 2 bis: memoization

Well, laziness is about memoization, so let us do that. We just need to not forget the original expression. We attach a thunk to recursive fields together with their functional expression: (Option<Thunk>, Term) (or really a reference to the expression in order to share it among all descendants). The expression would be inherited when merging, but not the thunk. y.b + y.b now evaluates the expression of b only once.

Towards a build system

There are still inefficiencies: the thunk of recursive expressions are reset (which potentially duplicates work) at each merge, even if nothing that they depend on is actually modified:

let x = {a = 1; b = 1 + a;} in
let y = {c = 1;} in
x.b + y.b //a is unchanged but b is evaluated 2 times

The problem faced here is actually the more general one that a build system tries to solve, at least according to the view of Build systems à la carte. We have a bunch of values (fields) with dependencies and when some of them are modified (here, overridden by a merge), we want to update only those which may be affected. Going down the rabbit hole leads us to bake a build system in Nickel.

How deep should we go ?

At this point it is not totally clear to me how much all of these performance problems matter in practice. For example, the inefficiency motivating the last section only shows up if we are interleaving field accesses and merges of different generations of records. If the common pattern is rather to first perform a sequence of merges to assemble packages and then exclusively access the fields of the children, the non-sharing of evaluated thunks between parents and children is a non-issue.

I feel like the approach of Take 2 : bis is a reasonably simple and efficient first step, on which we can build upon later if necessary.

Misc

How does Jsonnet do it ?

It is not clear from the specs or the reference how objects and laziness interact in Jsonnet. After taking a quick look at the code, it does not seem that fields using self are memoized, if object fields are memoized at all.

Statically determining dependency

For our proposals starting from the Take 2 section to be feasible, it is necessary to determine which variables are referring to recursive fields, in order to do the syntactical transformation to a function: { a = b + 1; b = 2; } => { a = self.b + 1; b = 2; }.

Note that this is not always possible in presence of other dynamic binding rules. In particular, in Nix the with construct introduces such dynamic binding. If we allow both with and recursive records without the self keyword, then in the following example:

let f = fun x => { 
  a = with x; b;
  b = 1; } in

It is impossible to know if b refers to its sibling and should be self.b or if it is actually a variable introduced by x. Actually it can be both and this is determined at each usage site.

We can still work this out by mixing the initial approach of take 1 with later steps: instead of transforming expressions to functions of self, we just remember them in their original form. Variables are then resolved when the field is accessed. But this impedes the capacity of statically determining dependencies, for both the interpreter and the programmer. This is why my preference in general goes to a non-ambiguous syntax where recursion is explicit at usage site.

Effects

In presence of non-idempotent effects, there is also the question of the re-evaluation of effects:

let a = { a = someEffect() + b; b = 1; c = someEffect(); } in
merge a {b = 2; } // should the first occurrence of someEffect be re-evaluated ? both ? or none ?

yannham · 2022-10-18T10:48:21Z

I think this discussion is superseded by RFC001 (#330).

This was referenced Sep 3, 2020

Evaluating thunks with enriched value alter their behavior #123

Closed

Recursive records (à la Nix) #135

Merged

aspiwack mentioned this issue Dec 9, 2020

Lazy map operator #245

Merged

yannham mentioned this issue Mar 19, 2021

[RFC]Overriding #330

Merged

yannham mentioned this issue Dec 11, 2021

Recursive overriding #491

Merged

yannham mentioned this issue Mar 21, 2022

Request for more numeric data types Fixed point decimal, and baseN arithmetic. #661

Closed

yannham closed this as completed Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging recursive record: pondering the semantics #103

Merging recursive record: pondering the semantics #103

aspiwack commented Jul 22, 2020

yannham commented Jul 28, 2020 •

edited

Loading

yannham commented Oct 18, 2022

Merging recursive record: pondering the semantics #103

Merging recursive record: pondering the semantics #103

Comments

aspiwack commented Jul 22, 2020

yannham commented Jul 28, 2020 • edited Loading

Semantics

(Efficient) Implementation

Take 1: dynamic scoping

Take 2 : functions

Take 2 bis: memoization

Towards a build system

How deep should we go ?

Misc

How does Jsonnet do it ?

Statically determining dependency

Effects

yannham commented Oct 18, 2022

yannham commented Jul 28, 2020 •

edited

Loading