Merge is not idempotent #1187

yannham · 2023-03-16T15:16:15Z

Describe the bug

The merge operation is advertised as being idempotent, but it is currently not. In particular, non-empty arrays are not mergeable, even if they are the same. It's probably fine not to be idempotent on functions (or is it?), but it's a real problem on data, as detailed through actual examples below.

To Reproduce

Here is a typical example of non idempotency:

nickel>let path = ["/bin/bash", "/bin/sh"] in
path & path
error: non mergeable terms
  ┌─ repl-input-0:1:12
  │
1 │ let path = ["/bin/bash", "/bin/sh"] in
  │            ^^^^^^^^^^^^^^^^^^^^^^^^
  │            │
  │            cannot merge this expression
  │            with this expression
2 │ path & path
  │ ----------- merged here
  │
  = Both values have the same merge priority but they can't be combined

It actually happens in real life, in particular with default values. If a contract sets a default value which is a non-empty array, and the contract happened to be applied several times, then we get the following. For example, in Nixel:

The base builder BashShell defines an env field, that automatically compute the environment from the structured_env field
Most higher-level builders like RustShell or CShell "derives" from this base builder, that is RustShell = BashShell & {...}.
Now, if you want to define a development shell with both a Rust and a C environment, you get an error, because Nickel doesn't succeed in merging the two auto-generated env, while they're exactly the same. The issue is that writing MyShell = {...} & RustShell & CShell evaluates to MyShell = {...} & BashShell & { ...def of Rust shell.. } & BashShell & { ... def of CShell ... }, leading to two occurrences of BashShell.

So, "contract inheritance" can lead to the same contract being applied several time. Different parts of the code (a framework consuming a configuration versus the configuration itself) may also have to ensure that some contract is applied, and can't rely on the fact that another part will do it.

Expected behavior

Merging pure data (excluding functions, at least as a first step) should be idempotent.

Possible solutions

Until now, we've refrained from merging arrays, because it might not correspond to what the user has in mind (typically, concatenation makes more sense that pointwise merging in a lot of concrete applications). But this is becoming a problem. Here are possible solutions:

"artificially" makes merging idempotent, by testing arrays for equality. If they are equals, return the first operand, otherwise fail. This leads to no unexpected merges: if two arrays are different, merge will fail.
actually define a merging behavior on all arrays. One which sounds mathematically natural, so to speak, would be to merge elements point-wise. In fact, both record and arrays could be seen as finite-support functions. On record, merge is then just point-wise merging. It could make sense to do that for arrays as well. In practice, it means that arrays are merged "like records", provided they are seen as a record with consecutive integers keys 0, 1, ..., n-1 (where n the length). This would naturally make merging idempotent on arrays, and doesn't sound far-fetched. However, it could lead to unwilling merge to actually proceed without errors (that being said, we have a type and contract system for that).

In practice, it seems most problematic merges are actually merging the exact same definition (pointing to the same part of the AST). They could in principle be elided by an optimization which checks, before merging, if two values are physically equal, in which case the merging would be eliminated altogether (we already do that for equality testing). However, at least currently, this optimization would break referential transparency and makes the semantics quite fragile. As long as one stays within the zone where the interpreter can prove two things are physically equal, everything works fine. But if we add one indirection and get out of this zone, suddenly the semantically identical config refuses to evaluate. Thus, I see this as a possible addendum to one of the two previous solutions: only if merge is actually idempotent, this optimization is sound and doesn't change the behavior. We shouldn't rely on this optimizations for program to have the right behavior.

The text was updated successfully, but these errors were encountered:

aspiwack · 2023-03-20T08:46:36Z

My inclination is: let's do pointwise merging on arrays. I have thought about it all of 2min, and I'm not saying that it's a particularly smart answer. But it's where I'm at right now.

Concatenation, for instance, is very much not idempotent. There is a question of how we can allow custom “datatypes” with their own merge function in the future (and how to make it not be terrible, which is really non-obvious). I think concatenation (by which we really mean union of sets) could be one of these.

matthew-healy · 2023-03-20T09:17:38Z

Pointwise merging sounds mathematically natural, but I think the behaviour sounds like it could be quite counter-intuitive to an end-user. It really seems like the behaviour you want here is largely dependent on the domain the array is representing, which I guess is another vote for some form of custom merge function.

In in the meantime I'd be in favour of doing the least surprising thing. I think it's a lot easier for a layperson to understand "arrays merge iff they're equal" than to understand "arrays use pointwise merging, which means the exact behaviour depends on the contents of the arrays & how those merge".

aspiwack · 2023-03-20T09:21:14Z

The problem is that testing for equality is strict. And may cause massive evaluations in edge cases. I'm not sure that it is quite so reasonable. There is most definitely no perfect solution there. Arrays are just trickier than they first appear.

matthew-healy · 2023-03-20T09:40:05Z

Ah, that's a good point. I suppose we could implement some sort of lazy pointwise equality contract(s) instead, by deferring the equality check until the array members are actually accessed?

The main concern for me is situations like:

let creds_l = [{ username = "...", password = "..." }] in
let creds_r = [{ ssh = "..." }] in
creds_l & creds_r # [{ username = "...", password = "...", ssh = "..."}]

where a pointwise merge leads to an error that wouldn't necessarily be simple to diagnose. Even if a lazy equality check is more complex to implement, I still think the logic is simpler to understand & would lead to better diagnostics for the above kind of mistake.

That said, I guess this could be somewhat mitigated by investing in good tutorial content around merging.

aspiwack · 2023-03-20T10:07:40Z

Ah, that's a good point. I suppose we could implement some sort of lazy pointwise equality contract(s) instead, by deferring the equality check until the array members are actually accessed?

This is fair enough. It's not harder than lazy contract checking or anything. So it could be a good non-committal approach.

aspiwack · 2023-03-20T10:08:55Z

The main concern for me is situations like:
let creds_l = [{ username = "...", password = "..." }] in
let creds_r = [{ ssh = "..." }] in
creds_l & creds_r # [{ username = "...", password = "...", ssh = "..."}]
where a pointwise merge leads to an error that wouldn't necessarily be simple to diagnose. Even if a lazy equality check is more complex to implement, I still think the logic is simpler to understand & would lead to better diagnostics for the above kind of mistake.

Out of curiosity, what is the behaviour you believe that a user expects in this situation?

matthew-healy · 2023-03-20T10:47:50Z

Out of curiosity, what is the behaviour you believe that a user expects in this situation?

I don't know that users will have a specific general expectation of what should happen here, but I can certainly imagine situations where one would write code like the above assuming it would concatenate the arrays, because it's the most immediate way to "merge" the values. I think that users will likely have different expectations depending on the specific domain they're representing, and - at least for me personally - pointwise merging is unlikely to be the most intuitive option.

I also think it's quite likely many of these expectations will not be idempotent, but that's probably something to confront later if/when we think about overriding merge behaviour.

This is broadly why I think it makes sense to be as restrictive as possible here, at least initially.

yannham · 2023-03-20T10:52:43Z

I feel like point-wise "lazy" equality testing does sound like a reasonable compromise. As @aspiwack says, it's less committing. Technically, I don't expect it to be hard to implement: merge already exists in two different modes (normal merging and contract application), I think lazy equality would correspond to a form of symmetric contract application, which could be a third mode (the original one is: the left-hand side mustn't have fields that aren't in the right-hand side, while the "lazy equality" would be fields must be exactly the same).

As often, the most challenging part might be to provide good error reporting, because it involves threading error reporting data alongside recursive merges. Maybe we can make use of labels here as well?

Concatenation, for instance, is very much not idempotent. There is a question of how we can allow custom “datatypes” with their own merge function in the future (and how to make it not be terrible, which is really non-obvious). I think concatenation (by which we really mean union of sets) could be one of these.

In principle, this is answered already in RFC001 using the custom merge functions approach. It hasn't been implemented yet, and has been postponed to post 1.0, being nice but not strictly necessary to write useful programs. Custom merge functions must be "semantically" associative, idempotent and commutative (e.g. representing stuff as arrays and doing concatenation is fine if the associated final contract, or the consuming system outside of the boundaries of Nickel, does de-duplication and sorting, or doesn't care about duplication and order).

aspiwack · 2023-03-20T12:45:52Z

As often, the most challenging part might be to provide good error reporting, because it involves threading error reporting data alongside recursive merges. Maybe we can make use of a labels here as well?

This is what I'd expect. It's essentially a contract application (we can think of equality as a binary contract. Then, partially applied equality is a regular contract. So we should have the machinerie to give reasonable error messages for this).

yannham added the type: bug label Mar 16, 2023

yannham added this to the Next major (1.0) milestone Mar 16, 2023

This was referenced Mar 24, 2023

Implement array merging, part I #1199

Closed

Add Equal contract to the stdlib #1203

Merged

yannham closed this as completed in #1203 Mar 28, 2023

yannham reopened this Mar 29, 2023

This was referenced Mar 31, 2023

Add a merge label #1217

Merged

Export a parser for Types and refactor a bit parser #1227

Merged

Implement array merging, take II #1229

Merged

yannham closed this as completed in #1229 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge is not idempotent #1187

Merge is not idempotent #1187

yannham commented Mar 16, 2023

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023 •

edited

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023

aspiwack commented Mar 20, 2023

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023

yannham commented Mar 20, 2023 •

edited

aspiwack commented Mar 20, 2023

Merge is not idempotent #1187

Merge is not idempotent #1187

Comments

yannham commented Mar 16, 2023

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023 • edited

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023

aspiwack commented Mar 20, 2023

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023

yannham commented Mar 20, 2023 • edited

aspiwack commented Mar 20, 2023

matthew-healy commented Mar 20, 2023 •

edited

yannham commented Mar 20, 2023 •

edited