-
Notifications
You must be signed in to change notification settings - Fork 15
Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56
Comments
To give an example, we can roughly implement this as
And this has unexpected behavior when the iterable has an even number of duplicate elements.
Note the the spec'd algorithm is correct if the iterable does not have any duplicate elements, which means it is correct for two sets. |
The implication is that if the argument can be any iterable, that an internal Set must be used to dedupe anyways (either alongside, or, by converting the iterable to a Set). This relates to #50. |
I don't actually that this is "wrong", because symmetric difference is not normally defined in contexts where there can be duplicates, but it's inconsistent with e.g. python: >>> set([]).symmetric_difference([0, 0])
{0} #51 fixes this, mostly by accident. |
In that case, it's problematic for the signature of the method to take an iterable, given that a iterable, unlike a set, can possibly and is likely to contain duplicate values. It is weird for behavior to undefined in some cases of valid inputs. For the method to fail in this way would be surprising to the developer and the source of subtle bugs. I would hope that there would be some form of test case to check this behavior when implemented. |
Sorry, to be clear, the mathematical operation called "symmetric difference" is not generally defined for collections which can contain duplicates. The behavior of the function proposed here of course will be defined. I agree that the current behavior is unnecessarily surprising, and should probably behave the way you labelled "correct". It's just not wrong, per se, because it's a case the mathematical operation doesn't really cover. |
That is a solution. There may be a way to chain iterables/iterators to avoid creation of yet-another set -- I will contemplate this. |
I agree that this is a bug and the intention is to support an iterable as an argument, not just a Set. |
I see two options:
I'm not sure which one is better, because obviously, symmetric difference doesn't make sense for iterables with non-unique members. |
I'd lean towards deduping personally. (Doesn't have to be before iteration, though. Again, see #51, which mostly-incidentally fixes this.) |
I would find this behavior really undesirable. |
It seems to me that permitting iterables as parameters on these methods is a convienence to the caller to accept a liberal set of inputs, rather than forcing to a set before calling. This mirror's Python behavior with the set methods, but not the set operators, which require both operands to be sets.
There are also obvious benefits to accepting an iterable -- at no point do all elements in the iterable need to be in an in-memory set, which is a good property for iterables with a very large numbers of elements. Presumably, this isn't much of a problem for implementing union, intersection, and difference, in which only one of the two containers needs to be hashable for lookups. I suspect this is not true for symmetricDifference (I would love to be proven wrong here). So the options I see are to, as mentioned above, to "dedupe" (which I take to mean creating a set from the iterable), which sadly loses the property described in the previous paragraph, or to tighten the parameter type to a Set, which makes symmetricDifference an outlier from the other operations, but at least makes it clear that the operation cannot be done with the same space efficiency as the others. |
I question that most users of these APIs will be thinking about space efficiency as much as they'll be thinking about value uniqueness. |
I'm largely ambivalent about which route to take save for the fact that there is an implicity about the method taking an iterable and the space/time performance. Either route forward is fine with me -- I just want to illuminate the implications for the sake of decision making. |
I believe we could implement this with an iterable without storing iterable in memory:
Takes advantage of the fact that we never delete from the original set. |
(disclosure -- @tbondwilkinson and I know each other and have discussed this outside this thread) My biggest concern is "flattening" If @tbondwilkinson 's suggestion passes test caes, I believe it would be much preferable to flattening iterable. |
see also some sanity case runs: https://codepen.io/anon/pen/KJRrzP?editors=0011 |
I've just rewritten the spec text; the new algorithm does not have this problem. It takes the essentially approach mentioned in the above comment, though now the calls to |
noted while writing polyfill in Closure cl/229848386 (see review comments)
See
https://tc39.github.io/proposal-set-methods/#Set.prototype.symmetricDifference
In 5. 10. d. -- you're calling "remover" on newSet and using the return value to indicate whether the value is in the original set. However, since you're comparing to an iterable (which can have duplicate values) and not a set (which cannot), it means a future value of nextValue may see the same value, which you then use with remover and get a "false" value of removed, erroneously adding it back to newSet.
Instead, you should not depend on the value of newSet.delete() but independently call this.has(value) to see if it is in the original set.
Requesting you to confirm my logic. Feel free to follow up with questions.
The text was updated successfully, but these errors were encountered: