Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56

nanaze · 2019-01-30T06:54:54Z

noted while writing polyfill in Closure cl/229848386 (see review comments)

See
https://tc39.github.io/proposal-set-methods/#Set.prototype.symmetricDifference

In 5. 10. d. -- you're calling "remover" on newSet and using the return value to indicate whether the value is in the original set. However, since you're comparing to an iterable (which can have duplicate values) and not a set (which cannot), it means a future value of nextValue may see the same value, which you then use with remover and get a "false" value of removed, erroneously adding it back to newSet.

Instead, you should not depend on the value of newSet.delete() but independently call this.has(value) to see if it is in the original set.

Requesting you to confirm my logic. Feel free to follow up with questions.

jplaisted · 2019-01-30T16:29:32Z

To give an example, we can roughly implement this as

function symmetricDifference(set, iterable) {
  const newSet = new Set(set);
  for (const e of iterable) {
    const removed = newSet.delete(e);
    if (removed) newSet.add(e);
  }
  return newSet;
}

And this has unexpected behavior when the iterable has an even number of duplicate elements.

symmetricDifference(new Set(), [0]); // (0), correct
symmetricDifference(new Set(), [0, 1]); // (0, 1), correct
symmetricDifference(new Set(), [0, 1, 1]); // (0), incorrect
symmetricDifference(new Set(), [0, 1, 1, 1]); // (0, 1), correct
// etc

// Also unexpected if the set also contains the element...
symmetricDifference(new Set([1]), [0]); // (1, 0), correct
symmetricDifference(new Set([1]), [0, 1]); // (0), correct
symmetricDifference(new Set([1]), [0, 1, 1]); // (0, 1), incorrect
symmetricDifference(new Set([1]), [0, 1, 1, 1]); // (0), correct
//etc

Note the the spec'd algorithm is correct if the iterable does not have any duplicate elements, which means it is correct for two sets.

ljharb · 2019-01-30T16:36:30Z

The implication is that if the argument can be any iterable, that an internal Set must be used to dedupe anyways (either alongside, or, by converting the iterable to a Set). This relates to #50.

bakkot · 2019-01-30T16:50:55Z

I don't actually that this is "wrong", because symmetric difference is not normally defined in contexts where there can be duplicates, but it's inconsistent with e.g. python:

>>> set([]).symmetric_difference([0, 0])
{0}

#51 fixes this, mostly by accident.

nanaze · 2019-01-30T17:28:01Z

I don't actually that this is "wrong", because symmetric difference is not normally defined in contexts where there can be duplicates

In that case, it's problematic for the signature of the method to take an iterable, given that a iterable, unlike a set, can possibly and is likely to contain duplicate values. It is weird for behavior to undefined in some cases of valid inputs. For the method to fail in this way would be surprising to the developer and the source of subtle bugs.

I would hope that there would be some form of test case to check this behavior when implemented.

bakkot · 2019-01-30T17:37:06Z

It is weird for behavior to undefined in some cases of valid inputs.

Sorry, to be clear, the mathematical operation called "symmetric difference" is not generally defined for collections which can contain duplicates. The behavior of the function proposed here of course will be defined.

I agree that the current behavior is unnecessarily surprising, and should probably behave the way you labelled "correct". It's just not wrong, per se, because it's a case the mathematical operation doesn't really cover.

nanaze · 2019-01-30T17:54:14Z

The implication is that if the argument can be any iterable, that an internal Set must be used to dedupe anyways (either alongside, or, by converting the iterable to a Set). This relates to #50.

That is a solution. There may be a way to chain iterables/iterators to avoid creation of yet-another set -- I will contemplate this.

gsathya · 2019-01-30T23:40:01Z

I agree that this is a bug and the intention is to support an iterable as an argument, not just a Set.

Ginden · 2019-02-01T02:54:54Z

I see two options:

dedupe iterable before iteration
throw error

I'm not sure which one is better, because obviously, symmetric difference doesn't make sense for iterables with non-unique members.

bakkot · 2019-02-01T03:00:40Z

I'd lean towards deduping personally.

(Doesn't have to be before iteration, though. Again, see #51, which mostly-incidentally fixes this.)

nanaze · 2019-02-01T18:22:25Z

throw error

I would find this behavior really undesirable.

nanaze · 2019-02-01T18:40:21Z

It seems to me that permitting iterables as parameters on these methods is a convienence to the caller to accept a liberal set of inputs, rather than forcing to a set before calling. This mirror's Python behavior with the set methods, but not the set operators, which require both operands to be sets.

>>> a = set([1,2])
>>> a.union([3])
set([1, 2, 3])
>>> a | set([3])
set([1, 2, 3])
>>> a | [3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'set' and 'list'

There are also obvious benefits to accepting an iterable -- at no point do all elements in the iterable need to be in an in-memory set, which is a good property for iterables with a very large numbers of elements.

Presumably, this isn't much of a problem for implementing union, intersection, and difference, in which only one of the two containers needs to be hashable for lookups. I suspect this is not true for symmetricDifference (I would love to be proven wrong here). So the options I see are to, as mentioned above, to "dedupe" (which I take to mean creating a set from the iterable), which sadly loses the property described in the previous paragraph, or to tighten the parameter type to a Set, which makes symmetricDifference an outlier from the other operations, but at least makes it clear that the operation cannot be done with the same space efficiency as the others.

ljharb · 2019-02-01T18:42:18Z

I question that most users of these APIs will be thinking about space efficiency as much as they'll be thinking about value uniqueness.

nanaze · 2019-02-01T18:50:22Z

I question that most users of these APIs will be thinking about space efficiency as much as they'll be thinking about value uniqueness.

I'm largely ambivalent about which route to take save for the fact that there is an implicity about the method taking an iterable and the space/time performance. Either route forward is fine with me -- I just want to illuminate the implications for the sake of decision making.

tbondwilkinson · 2019-02-12T01:00:20Z

I believe we could implement this with an iterable without storing iterable in memory:

function symmetricDifference(set, iterable) {
  const newSet = new Set(set);
  for (const e of iterable) {
    if (set.has(e)) {
      newSet.delete(e);
    } else {
      newSet.add(e);
    }
  }
  return newSet;
}

Takes advantage of the fact that we never delete from the original set.

nanaze · 2019-02-12T01:11:24Z

(disclosure -- @tbondwilkinson and I know each other and have discussed this outside this thread)

My biggest concern is "flattening" iterable -- in the other methods, it's a desirable quality to not flatten an iterable as it might be backed by I/O and a giant data set -- "this" is already in memory, so it probably is conceivably "flattenable".

If @tbondwilkinson 's suggestion passes test caes, I believe it would be much preferable to flattening iterable.

nanaze · 2019-02-12T01:22:31Z

see also some sanity case runs: https://codepen.io/anon/pen/KJRrzP?editors=0011

bakkot · 2022-09-17T19:21:16Z

I've just rewritten the spec text; the new algorithm does not have this problem. It takes the essentially approach mentioned in the above comment, though now the calls to add and delete operate on a spec-internal list and so are not observable; it's observably equivalent to copying the argument into a new Set and then intersecting that with the receiver.

nanaze changed the title ~~possible bug in Set.prototype.symmetricDifference spec~~ Set.prototype.symmetricDifference spec has unexpected behavior when given iterable contains duplicate elements Jan 30, 2019

Ginden added the bug label Feb 1, 2019

nanaze changed the title ~~Set.prototype.symmetricDifference spec has unexpected behavior when given iterable contains duplicate elements~~ Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements Feb 2, 2019

bakkot closed this as completed Sep 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56

Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56

nanaze commented Jan 30, 2019

jplaisted commented Jan 30, 2019

ljharb commented Jan 30, 2019

bakkot commented Jan 30, 2019 •

edited

Loading

nanaze commented Jan 30, 2019

bakkot commented Jan 30, 2019

nanaze commented Jan 30, 2019

gsathya commented Jan 30, 2019

Ginden commented Feb 1, 2019

bakkot commented Feb 1, 2019

nanaze commented Feb 1, 2019

nanaze commented Feb 1, 2019 •

edited

Loading

ljharb commented Feb 1, 2019

nanaze commented Feb 1, 2019

tbondwilkinson commented Feb 12, 2019 •

edited

Loading

nanaze commented Feb 12, 2019

nanaze commented Feb 12, 2019

bakkot commented Sep 17, 2022

Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56

Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56

Comments

nanaze commented Jan 30, 2019

jplaisted commented Jan 30, 2019

ljharb commented Jan 30, 2019

bakkot commented Jan 30, 2019 • edited Loading

nanaze commented Jan 30, 2019

bakkot commented Jan 30, 2019

nanaze commented Jan 30, 2019

gsathya commented Jan 30, 2019

Ginden commented Feb 1, 2019

bakkot commented Feb 1, 2019

nanaze commented Feb 1, 2019

nanaze commented Feb 1, 2019 • edited Loading

ljharb commented Feb 1, 2019

nanaze commented Feb 1, 2019

tbondwilkinson commented Feb 12, 2019 • edited Loading

nanaze commented Feb 12, 2019

nanaze commented Feb 12, 2019

bakkot commented Sep 17, 2022

bakkot commented Jan 30, 2019 •

edited

Loading

nanaze commented Feb 1, 2019 •

edited

Loading

tbondwilkinson commented Feb 12, 2019 •

edited

Loading