Avoid "delete" and "add"? #50

ljharb · 2019-01-15T01:42:58Z

After the latest round of changes, I wanted to start fresh and summarize my updated thought process:

Produce a new set

These species-construct a new set from the receiver.

union
intersection
difference
symmetricDifference

"Addable"

These create a contract around having an "add" method.

union
intersection
symmetricDifference

"Queryable"

These create a contract around having a "has" method.

intersection
isSubsetOf
isSupersetOf
isDisjointFrom

"Deletable"

These create a contract around having a "delete" method.

difference
symmetricDifference

Generic receiver

These work with a non-Set iterable receiver:

isSubsetOf
isSupersetOf
isDisjointFrom

It seems like it'd be nicer to have only one string-based protocol added by this proposal instead of three. Is there any way we could build up an iterable of "items to add" in all addable/deleteable cases, and construct the new set with that, rather than needing the "add" or "delete" method?

bakkot · 2019-01-15T01:52:01Z

Note that the Set constructor already creates a contract around having .add, though it's hard to observe.

Also, I think it'd be helpful to this issue to split out whether the contract applies to the receiver, to instances of the receiver's species constructor, or to the argument. Contracts which apply only to the receiver or its species constructor seem like a different sort of thing than contracts which apply to arguments.

gsathya · 2019-01-15T01:55:26Z

It seems like it'd be nicer to have only one string-based protocol added by this proposal instead of three.

IMO, there's only one protocol added by this proposal and that's the has-protocol, largely for isSubsetOf (but also intersection) that works directly on the argument/receiver. The rest of these methods work on an object constructed using the SpeciesConstructor, so I wouldn't necessarily classify these as new protocols. I expect these to be subclasses of Set which hold these methods.

Is there any way we could build up an iterable of "items to add" in all addable/deleteable cases, and construct the new set with that, rather than needing the "add" or "delete" method?

The biggest problem with this is how would this work with subclassing? Also, given the above about how these aren't necessarily new protocols, I'm not sure if there's a lot of value in trying to do this.

ljharb · 2019-01-15T01:57:34Z

Since the entire list would be passed into the subclass's constructor, then the subclass would handle it in its constructor (which indeed calls add, as @bakkot pointed out).

"delete" seems to be a new protocol, though.

bakkot · 2019-01-15T02:13:09Z

@ljharb To be clear, you are proposing that Set.prototype.intersection would look something like

// assume the existence of a CreateListFromIterable abstract operation

1. Let _set_ be the *this* value.
1. If Type(_set_) is not Object, throw a *TypeError* exception.
1. Let _Ctr_ be ? SpeciesConstructor(_set_, %Set%).
1. Let _thisIteratorRecord_ be ? GetIterator(_set_).
1. Let _otherIteratorRecord_ be ? GetIterator(_iterable_).
1. Let _thisItems_ be ? CreateListFromIterable(_thisIteratorRecord_).
1. Let _otherItems_ be ? CreateListFromIterable(_otherIteratorRecord_).
1. Let _bothItems_ be a new empty List.
1. For each element _thisItem_ of _thisItems_, do
    1. For each element _otherItem_ of _otherItems_, do
        1. If SameValueZero(_thisItem_, _otherItem_) is *true*, add _thisItem_ to _bothItems_.
1. ** Possibly some sort of "filter out duplicates per SameValueZero" step here
1. Let _array_ be CreateArrayFromList(_bothItems_).
1. Return ? Construct(_Ctr_, _array_).

? And similar changes for union, intersection, difference, and symmetricDifference, but no changes to isSubsetOf, isSupersetOf, or isDisjointFrom?

ljharb · 2019-01-15T02:16:22Z

I think "filter out duplicates per SameValueZero" can be handled by the constructor, but yep, exactly that. I'm sure we could abstract most of those steps into an abstract op as well, a la AddEntriesFromIterable.

bakkot · 2019-01-15T02:17:53Z

That means that intersection is O(len(this) + len(other)), instead of O(len(other)). That seems unfortunate.

bakkot · 2019-01-15T02:33:44Z

Though I guess if that's a major concern, difference ought to treat its argument the same way isSubsetOf does. (Actually, maybe it should anyway?)

@gsathya, might it be worth doing this so as to avoid having to guard Set.prototype.delete? Not necessarily advocating, just asking.

gsathya · 2019-01-15T19:10:16Z

? And similar changes for union, intersection, difference, and symmetricDifference, but no changes to isSubsetOf, isSupersetOf, or isDisjointFrom?

I'd love to do this --remove all lookups, operate on an internal list and create a Set finally from this. Three main reasons I didn't do this is:
(a) Consistency: we'd be moving away from the pattern used by the Set constructor (which looks up the add method). I wasn't sure if the committee would be happy with this.
(b) Subclassing becomes less useful if internal operations are going to bypass overridden methods
(c) The runtime cost that @bakkot described (... although this isn't going to happen for almost all common cases)

I don't feel too strongly about any of these but I bet the committee does.

Though I guess if that's a major concern, difference ought to treat its argument the same way isSubsetOf does. (Actually, maybe it should anyway?)

Can you be more specific about the changes here?

@gsathya, might it be worth doing this so as to avoid having to guard Set.prototype.delete? Not necessarily advocating, just asking.

That's an excellent observation. It'd be great to get rid of this call, definitely.

bakkot · 2019-01-15T20:06:16Z

remove all lookups, operate on an internal list and create a Set

To make sure we're all on the same page, I was assuming that there would still be lookups for Symbol.iterator and 'has', and a call to the species constructor, which would itself call add (typically). But the actual logic would be internal operations on spec Lists, yes, and would thereby avoid calls to delete and add (except indirectly by calling the constructor).

Consistency: we'd be moving away from the pattern used by the Set constructor (which looks up the add method). I wasn't sure if the committee would be happy with this.

FWIW I personally wouldn't object: as long as it calls the species constructor with the list it's come up with, rather than manipulating internal slots directly, I think it's sufficiently consistent. No idea what the rest of the committee would think.

Subclassing becomes less useful if internal operations are going to bypass overridden methods

If those methods are defined purely in terms of Symbol.iterator, 'has', and calls to the species constructor, I would expect that to be OK: a well-behaved subclass will need to make those work consistently too. Again, no idea what the rest of the committee would think.

The runtime cost that @bakkot described

I think this can also be avoided if you're willing to go with a somewhat hybrid approach, as long as everyone is on board with the "has" thing (on receivers). Something like

Set.prototype.intersection = function (iterable) {
  let Ctr = SpeciesConstructor(this, Set);
  let bothItems = []; // assume this is a spec list
  for (let element of iterable) {
    if (this.has(element)) {
      bothItems.push(element);
    }
  }
  bothItems = FilterSameValueZeroDuplicates(bothItems);
  return new Ctr(bothItems);
}

Can you be more specific about the changes here?

I am envisioning something like

Set.prototype.difference = function (iterable) {
  let Ctr = SpeciesConstructor(this, Set);
  let newSet = new Ctr;

  let otherSet = iterable;
  let hasCheck = otherSet.has;
  if (typeof hasCheck !== 'function') {
    otherSet = new Set(iterable);
    hasCheck = otherSet.has;
    if (typeof hasCheck !== 'function') throw new TypeError;
  }

  // assuming we don't take the approach suggested in this issue
  // if we did, presumably we'd build up the list of elements internally, then call Ctr with it

  for (let element of this) {
    if (!hasCheck.call(otherSet, element)) {
      newSet.add(element);
    }
  }
  return newSet;
}

bakkot · 2019-01-15T20:16:08Z

I can make a PR so we can all see the diff, if you want, though I can't guarantee I'd get to it for a day or two.

gsathya · 2019-01-15T22:13:09Z

To make sure we're all on the same page, I was assuming that there would still be lookups for Symbol.iterator and 'has', and a call to the species constructor, which would itself call add (typically). But the actual logic would be internal operations on spec Lists, yes, and would thereby avoid calls to delete and add (except indirectly by calling the constructor).

ACK. This matches what I was thinking as well.

I think this can also be avoided if you're willing to go with a somewhat hybrid approach, as long as everyone is on board with the "has" thing (on receivers). Something like

LGTM

I am envisioning something like

Looks like we have two allocations here.. that seems unfortunate. Although I can probably optimize that away for the common cases.

I can make a PR so we can all see the diff, if you want, though I can't guarantee I'd get to it for a day or two.

Sounds great. Thanks!

bakkot · 2019-01-15T23:17:14Z

Looks like we have two allocations here.. that seems unfortunate.

Do you mean in the typeof hasCheck !== 'function' branch, or something else? That's so, but matches the same branch for isSubsetOf in its current state, which causes that method to have an allocation where otherwise there would be none. (The approach taken in #43 would get rid of those allocations :D )

Ginden · 2019-01-16T00:25:08Z

Also, I think it'd be helpful to this issue to split out whether the contract applies to the receiver, to instances of the receiver's species constructor, or to the argument. Contracts which apply only to the receiver or its species constructor seem like a different sort of thing than contracts which apply to arguments.

Only isSubsetOf checks for presence of has method on it's argument. This can (and probably should) be avoided by always calling Construct(%Set%, iterable).

Other methods enforce contracts on either this (receiver) or result of Construct(SpeciesConstructor(this, %Set%))

bakkot · 2019-01-16T08:06:37Z

Made a PR: #51.

domenic · 2019-01-16T20:30:58Z

Hey folks, watching this from the sidelines and I continue to be struck by the over-complexity that we are injecting into the spec, in the name of genericism, subclassability, and the desire to touch the public API as many times as possible in order to give other classes (or other thisArgs) a chance to insert their own behaviors. As someone who did the same thing with promises in ES2015, and lived to regret it greatly, it's heartbreaking to see those same patterns propagated further. I'd hoped we'd do things differently going forward.

As such, I want to present some alternatives that I've written up. I've done only the difference() method, but you can apply the same principles to all of them.

The proposed spec text is at https://gist.github.com/domenic/44773a7610dc8541bf2a83df2a3ce990. The guiding principles, which I think are fairly generally applicable, are:

Convert from user-hookable objects (Set or iterable) to spec-internal data structures ASAP, "at the boundary".
Operate entirely on the spec-internal data structures for the actual algorithm you're performing.
Convert back to user-hookable objects right before returning to the caller, no sooner
Extract and emplace data out of/into Sets by using their internal slots as the source of truth
Hard-code your object creation to not be hookable
- I'm unsure on this particular principle, as it just seems kind of nicer to use new this.constructor(). But hard-coding is way nicer than the gross species dance, and subclassing in general has not panned out well. The dream that "all the inherited methods will automatically do the right thing" that drives using new (this.constructor && this.constructor[Symbol.species] || Set)() has failed pretty badly, so I'd rather abandon it than keep it up.

I'm unsure how others feel about this, but there has been a lot of discussion, so I thought it couldn't hurt to try to put something out there that IMO is a lot better as a way forward.

gsathya · 2019-01-17T04:39:35Z

Hey folks, watching this from the sidelines and I continue to be struck by the over-complexity that we are injecting into the spec, in the name of genericism, subclassability, and the desire to touch the public API as many times as possible in order to give other classes (or other thisArgs) a chance to insert their own behaviors

The over-complexity is exactly the problem I'm struggling with too. I'm worried that we're trying to make this work for everyone's use case, turning this into a bit of a kitchen sink.

The proposed spec text is at https://gist.github.com/domenic/44773a7610dc8541bf2a83df2a3ce990.

I like this approach a lot, makes everything way simpler. It's straightforward to implement and optimize, but cripples subclassing... which is totally fine by me.

I'm slightly worried about this turning into a discussion about subclassing in general (and not about set methods), but I'd like to hear more from others about this approach.

@littledan @zenparsing WDYT?

domenic · 2019-01-17T04:45:01Z

I think "cripples subclassing" is overstating it a bit. It's important to separate subclassing and the desire to touch public APIs.

Subclassing still works fine in general. The only thing that might be slightly surprising is that you get back Set instances instead of Subclass instances. But we could back off a bit and re-include new this.constructor() (or, if you must, new (this.constructor && this.constructor[Symbol.species] || Set)()).

It's the pattern of going through as many public APIs as possible, in order to allow arguments or thisArgs to customize the algorithm's behavior, that the approach specifically avoids. Instead it treats the [[SetData]] as the source of truth---but any well-behaved subclass (i.e. one that calls super constructor and super methods) will work fine with that constraint.

gsathya · 2019-01-17T04:58:36Z

It's important to separate subclassing and the desire to touch public APIs.

Interesting distinction. In my mind, the reason to use the public APIs is to allow subclasses to do this exact sort of customization. But I think your point is that subclassing is more about "extending the behavior" rather than "replacing the behavior", in which case you're right, subclassing isn't affected as much.

zenparsing · 2019-01-17T05:25:09Z

I agree that Symbol.species leaves us with an... interesting legacy moving forward. On the other hand, having most of the current built-ins use species and some new ones not would be a different form of unfortunately legacy.

Leaving aside species, I agree with the general design goal of avoiding unecessary customizability "in the middle" of these algorithms. I need to think a little more about the semantics for each of these operations before I can comment usefully, though.

bakkot · 2019-01-17T06:03:26Z

It's the pattern of going through as many public APIs as possible, in order to allow arguments or thisArgs to customize the algorithm's behavior

I don't think "customize the algorithm's behavior" is the right way to think of this. The point of Symbol.iterator is to allow consumers to iterate over the contents a container. Algorithms which want to iterate over the contents of a container should therefore use Symbol.iterator. Internal slots are precisely that - internal. Subclasses shouldn't have to worry about them as long as they maintain a consistent public API. (I very much do not agree that a well-behaved subclass must always call its super methods.)

I don't care altogether that much care about species, and I agree it's perfectly reasonable to make the internal logic unhookable (cf #51), but I don't think it's correct to think of the "get contents" / "query membership" operations on the receiver or argument as being internal.

zenparsing · 2019-01-17T07:49:35Z

I'm probably missing some context, but here are some late night thoughts:

Currently, the extinsibility point for Set is the "add" method. It provides subclasses with the ability to map elements, filter the range of acceptable elements, etc. If we are going to support subclassing, then my first inclination would be to retain "add" as the only extensibility point, and use [[SetData]] for everything else.

For the operations that generate new sets, I would probably start by seeing if I could create a new, empty instance with the species constructor and then directly use "add" to build it up. I would try to avoid creating an array just to pass it into the species constructor.

I agree with @domenic insofar as I don't think we should feel compelled to make these algorithms generic over the receiver. We should be able to use [[SetData]] and not have to go through the ceremony of converting thisArg.[[SetData]] into an iterator.

domenic · 2019-01-17T14:25:34Z

Algorithms which want to iterate over the contents of a container should therefore use Symbol.iterator.

Sure, if you want to accept iterators, you should go through Symbol.iterator, as my second example does. My contention though is that if you want to accept Sets (as either thisArgs or arguments), you should go through [[SetData]] as the source of truth. You should not go through has()/delete()/add().

bakkot · 2019-01-17T16:59:56Z

If what you want to do with a Set is to iterate it, you should go through Symbol.iterator. Treating [[SetData]] as the source of truth for how to iterate means breaking subclassing.

For example, here is a perfectly reasonable Set subclass; it mostly just orders things differently. Why should Set.prototype.union be forced to give an answer inconsistent with the other methods when one of these is its receiver or argument? If it goes through Symbol.iterator, it will automatically be consistent.

We already have a protocol for iterating things. If the only thing an algorithm is doing with a value is iterating over its contents, it should use this protocol. Even if there is an internal slot it could use instead.

domenic · 2019-01-17T17:03:05Z

I guess I don't have much to say other than that I disagree. When operating on a Set, [[SetData]] is a better source of truth in my opinion than the monkey-patchable public API. Subclasses which don't keep an accurate [[SetData]] are not, in my opinion, "perfectly reasonable".

bakkot · 2019-01-17T17:30:37Z

I guess I assume that subclasses should not have to care (or think about) about the contents of internal slots, as long as the public API does the right thing. That seems almost tautological. That is what "internal" means.

domenic · 2019-01-17T17:41:34Z

What we're discussing here is precisely what the public API will do. So I think with either design, subclasses that care about making the public API do the right thing will be well-behaved (another tautology).

gsathya · 2019-01-17T17:48:54Z

We already have a protocol for iterating things. If the only thing an algorithm is doing with a value is iterating over its contents, it should use this protocol. Even if there is an internal slot it could use instead.

Why don't the existing Set methods use the iteration protocol? If this isn't a problem for existing methods, why is this a problem now?

bakkot · 2019-01-17T17:51:09Z

If x is an instance of a subclass, and y is not, x cannot make y.union(x) do the right thing unless Set.prototype.union use the iteration protocol rather than [[SetData]].

But also, the public API already exists. My example above is currently well behaved. Adding new methods to Set.prototype should not break that unless there's good reason to. And "we want to route around existing protocols for answering the specific question you have" is not a good reason.

Why don't the existing Set methods use the iteration protocol? If this isn't a problem for existing methods, why is this a problem now?

None of them take a set as an argument, only a receiver, so a subclass can override their behavior.

(But also, they should, and I would very much like to change that.)

bakkot · 2019-01-17T22:05:05Z

it's crucial that, given an arbitrary Set that you don't control (whose methods you don't have the ability to edit), that an argument passed to that Set's methods be able to customize what those methods do

That's... kind of an inside-out way of looking at it. What I would say is that it is crucial that a Set subclass be able to define how to add to, query membership in, and iterate that subclass. Since it happens that "is subset of" is defined only in terms of those operations, yes, that means the behavior of isSubsetOf should be affected in a particular way by those customizations, in exactly the same way as any other consumer would be.

a Set subclass which lacks the capability to exert this power on methods that receive instances of the subclass is incorrect

The Set subclass I gave above would behave inconsistently when passed to the hypothetical isSubsetOf. It is currently a correct subclass, in the sense that it behaves consistently when passed to anything that accepts Set subclasses - certainly when passed to a userland implementation of isSubsetOf. I think this is a fairly universal definition of "correct subclass". If we added an isSubsetOf which queried membership by reading [[SetData]], it would not be and could not be.

littledan · 2019-01-17T22:12:10Z

I want to second the voices here arguing for conversion or checks "at the boundary" rather than many observable points throughout the algorithm.

Talking about "subclassable builtins" can be confusing: The first part of subclassability is new.target, and we're good on all that (even for web platform objects, even though WebIDL doesn't currently specify it!). I'm genuinely happy about that change; it allows simple subclassing that adds methods, fields, etc. And the second part is adding in these extra points to modify behavior. I think calling into dynamically dispatched methods can be good sometimes, but I'm still not convinced that it should be our default.

I like the idea of object oriented design, and using method dispatch where it makes sense, but I think our default should be simple, predictable algorithms, and we should call into observable points where we have a particular use case for them. Although ES6 took a strong, opinionated stance towards using these points when it was possible to conceive of them, I think our experience shows that we should be more driven by use cases going forward, for deciding when to add this complexity.

I'm not sure if "performance" or "implementation complexity" is a strong argument for this group, but it's something that comes to mind for me when I think about this issue:

I did part of the work on the implementation of ES6 RegExp and Array subclassing in V8, and I deeply regret it. To avoid performance regressions while adding these observable points, we ended up adding "performance cliffs": The algorithm was something like, check whether anyone is observing it, if not take the fast path, otherwise take a new, slow path. Since my work there (which I'm not proud of at all), my colleagues in the V8 team have gone back and sped up many of those slow paths, but the performance difference remains, and an incredible amount of engineering effort went into this project.

Meanwhile, years later, it's still not clear what kind of software engineering benefit those changes brought to programmers. I don't get the impression that lots of people are deliberately subclassing RegExp and overriding exec for memoization. And if they do, they will find their RegExp operations run more slowly.

At the same time, if someone were to implement RegExp subclassing as a JavaScript library, rather than a revision of the built-in behavior, it would be significantly less source code (we're talking maybe 50-100 lines of JS), and they could invoke it where it's useful for them. It has never been clear to me why JavaScript needs a built-in framework for RegExp-like things, and it's not clear to me why we need such a framework for Set-like things.

bakkot · 2019-01-17T22:27:54Z

I think our default should be simple, predictable algorithms

I think that iterating things using the iteration protocol is simpler and more predictable than reaching into internal slots. In general I think routing around existing, well-known points of extensibility is always going to be confusing.

It has never been clear to me why JavaScript needs a built-in framework for RegExp-like things, and it's not clear to me why we need such a framework for Set-like things.

I agree that we should not have decided that RegExp ought to be subclassable. But if we want to decide that Set should not be subclassable, we should do that deliberately and thoroughly. We shouldn't just cripple subclassing piecemeal.

(Incidentally, I've subclassed Set (and Map) a fair bit in my own code; I don't think it's that unusual. Set-like things are pretty common and impossible to implement completely in userland with the same algorithmic complexity. That is, after all, why we put them in the language.)

littledan · 2019-01-17T23:03:47Z

Incidentally, I've subclassed Set (and Map) a fair bit in my own code; I don't think it's that unusual. Set-like things are pretty common and impossible to implement completely in userland with the same algorithmic complexity. That is, after all, why we put them in the language.

This is really interesting. I'd like to hear more about your use cases for subclassing Set. Thinking about your code, how might it take advantage of the particular hook-ability we're talking about here? Would it come up? Subclassing Set in the way that it's supported today is more about new.target handling, which I think we all agree on.

ljharb · 2019-01-17T23:06:49Z

@bakkot how many of your use cases of subclassing Set could be handled by calling super with a rekey function?

bakkot · 2019-01-17T23:29:44Z

@littledan I gave one example above, derived from something I actually did (which was ordering things in a slightly different way). I would expect passing instances of this to union or isSubsetOf or whatever would work, even when the receiver of those methods is not an instance of the subclass - after all, these instances work correctly everywhere else a Set does. That requires those methods to use the same mechanisms to interface with the subclass that any other code would.

Also, if I have multiple such subclasses coming from different places, I expect them to play nice with each other - for example, I would expect a.isSubsetOf(b) to work even if a and b are both subclasses of Set which don't know about each other, as long as they are otherwise well-behaved and A hasn't done anything weird with isSubsetOf. This can of course be made to work - it just requires A to have overridden isSubsetOf to do the obvious thing. But it seems like the implementation on the base class should just do the obvious thing.

As another example, I've made one-off implementations of MultiMap a bunch of times, though I don't think I've ever had occasion for MultiSet.

I have a Set which updates its iteration order when queried with has, so that the most recent items are first. That one actually would work OK with the [[SetData]] access, except that passing it to the built-in isSubsetOf would not reorder its contents, unlike every other access in the world.

I'm sure there's others I'm forgetting. There's a fairly common theme, though: these things customize add, has, and/or iteration and expect that consumers relying on updating, querying, or iterating sets will see their customizations. If union or isSubsetOf or whatever read from the [[SetData]] slot on their argument directly, these things will not work as arguments to them.

(These methods also would break when a subclass instance was used as their receiver, which means my subclasses would be incoherent until I updated them, but that's something I could eventually patch over by overriding those methods as well. I don't see any reason why they should break, but it's fixable.)

@ljharb The only ones I can recall for which rekey would have been useful are those which throw if certain invariants on the key are violated. rekey won't allow custom iteration order, distinguishing -0 and 0, storing one element multiple times, etc.

ljharb · 2019-01-17T23:33:13Z

why wouldn't rekey allow distinguishing -0 and 0? could you not do something like this?

function rekey(key) {
  if (typeof key === 'string') {
    return `string-${key}`;
  }
  if (typeof key === number && key === 0) {
    return `zero-${Object.is(key, -0) ? 'negative' : 'positive'}`;
  }
  return key;
}

bakkot · 2019-01-17T23:34:27Z

Ah, I didn't realize rekey mapped back when iterating.

ljharb · 2019-01-17T23:37:24Z

Assuming that it maps back on every operation into [[SetData]], it seems to me like it would reduce the number of needed extension points to virtually just the constructor, and possibly Symbol.iterator (altho i think iteration order would be tricky), which might make this proposal significantly less complicated while still allowing many use cases.

bakkot · 2019-01-17T23:41:04Z

I don't know that using rekey rather than supporting subclasses which overwrite has would actually make this proposal any less complicated? Those two seem very similar in their effects on this proposal.

ljharb · 2019-01-18T00:13:48Z

what i mean is, when rekey lands, perhaps this proposal could only use SetData (and the iteration protocol, as needed), since most of the reasons to override, say, has, are subsumed by rekey?

bakkot · 2019-01-18T00:20:44Z

I don't think all of my use cases for overriding has are subsumed by rekey, but yes, we could do that. I would regard that as essentially a decision that Set should not be subclassed and my remaining use cases are explicitly not going to be supported, even though they work fine today.

gsathya · 2019-01-18T00:44:52Z

Sorry for being pedantic but I don't want this to be misconstrued by anyone reading your comment without the whole context (which can often happen),

I would regard that as essentially a decision that Set should not be subclassed

If your definition of subclassing means "replaces a core set of methods and expect everything else to just work".

As others have chimed in here, that's not the obvious definition for a lot of folks and there are several other use cases of subclassing that continue to work fine (for ex, when you want to extend a Set).

bakkot · 2019-01-18T01:03:05Z

Sorry, yes, there are still certain kinds of subclass which work. It just means that has would become de-facto final. You can override it, if you want, but the standard library will simply ignore your replacement. And because there's no other place you can hook, your subclass will necessarily be inconsistent - the behavior of membership testing for your class will vary depending on whether the code which is doing the testing happens to be in the standard library.

If your definition of subclassing means "replaces a core set of methods and expect everything else to just work".

It's not just the core set of methods. The proposal is that there should be no method which could be replaced which would make baseSet.isSubsetOf(subclassInstance) consume subclassInstance in the way which all other code accepting a subclass of Set would consume it. I am fine with (not enthusiastic about, but fine with) replacing all the methods if that's what it takes to behave consistently. But if a class is subclassable, I expect that replacing all of the methods should be sufficient.

The existence of methods which read [[SetData]] off their argument without passing through any user-hookable method would mean that replacing all of the methods would not be sufficient. Many kinds of Set subclass which today behave consistently would not be able to behave consistently in that world.

ljharb · 2019-01-18T01:05:19Z

Isn't that how the size getter already works?

bakkot · 2019-01-18T01:06:03Z

The size getter can be overridden.

ljharb · 2019-01-18T01:11:21Z

ah, and in the event that something read the size, you’d want it to use your overridden getter instead of “the number of elements in [[SetData]]”, for example

bakkot · 2019-01-18T01:13:52Z

Yes. At least in the cases where that something was reading size of an argument; I would prefer it read the size of its receiver in the same way, so that subclasses would not be forced to override it, but as long as subclasses could override it that would be possible to work around.

littledan · 2019-01-18T09:59:18Z

I don't think this change means has would be de-facto final. Your code sample would continue to work fine, even if it doesn't get a transparent upgrade to the new feature. The private methods proposal that we have been collaborating on gives exactly the tools for JavaScript to do the kind of things we are talking about doing in Set, because reliable, predictable behavior is genuinely useful in various circumstances.

bakkot · 2019-01-18T17:06:12Z

@littledan, again, it's not just that it would not get automatically work with the new feature, it's that it could not be made to work with it (in a way consistent with its other behavior).

ajklein · 2019-01-18T20:19:41Z

While I find myself swayed by arguments on both sides of this discussion, I agree with @bakkot that the [[SetData]]-centric approach laid out by @domenic is staking out a significant claim about the subclassibility of Set. I think that should be decided on its own merits, rather than happening merely as a consequence of some choice made in a spec algorithm.

Ginden · 2019-01-19T06:30:56Z

I don't know what to think yet on most of issues raised in this thread, but I consider these important:

All methods should use SpeciesConstructor to create new element.
All methods should allow any iterable argument, not only sets.

bmeck · 2019-07-26T18:00:53Z

I'm interested by the points brought up here. I was asked to look into this issue after talking about a Map.prototype.updateOrInsert() method. I think looking into categorizing use cases can be enlightening when discussing if something should delegate to subclass property lookup. I haven't thought as much about specialized Sets; so, some of the following will be focused on Maps, apologies.

I find that subclassing can introduce some issues at least when looking at features that are conceptually able to be mixed but having a single inheritance chain makes them complex. For example if we have 2 classes:

// a set that only adds adults
class AdultsSet extends Set {
  add(person) {
    if (person.age < LEGAL_ADULT_AGE) { return; }
    else super.add(person);
  }
}
// a set that only adds people within a specific geographic region
class GeographicSet extends Set {
  #region;
  constructor(region) {
    this.#region = region;
  }
  add(person) {
    if (person.region !== this.#region) { return; }
    else super.add(person);
  }
}

It is plausible to want to mix the features of these 2 Sets to create a 3rd type that only has adults within a region as elements. I do not think subclassing aids in mixing these types to create a 3rd type, but in fact can hinder it. In particular the specialization of these subclasses do not relate to having a change to expected behaviors and thus seem more likely than other types to wish to be mixed.

However, in other scenarios like a multimap, fundamental parts of expectations can be confused. A key may still be in the collection even after a successful delete, etc. This does seem like a good place for subclassing because it doesn't adhere to some expectations, but wishes to use the same API surface for various reason. Even with these fundamental differences in expectations you could imagine something like having AdultMultiMap that only allows adults. Given a difference in expectations for the MultiMap it seems like any specialization like limiting adults, would also need to take that into account.

I am concerned with introducing these subclass design protocols in light of having examples that fundamentally seem to invalidate parts of protocol expectations. I worry that any design constraints expected by the subclassing will either not be enforceable or actively prevent interesting use cases.

In addition, with the exposure of subclassing design protocols we do not have a clean way of mixing disparate inheritance chains. In the past some ways to ease this problem such as mixins have been proposed, but that seem to further complicate the story of how to properly design subclasses.

On the other side, by using internal fields directly we do not allow the same easy subclassing story in isolation for things like AdultSet above.

I wonder if we could try and maximize designs such that simple subclasses that do not have complex protocol needs or fundamentally different aspects than their parent class can be solved without subclassing. This would put the burden of combining specialization up at the top level Base class for most cases, but would still allow complex subclasses to create necessary specializations. I think expanding the base class could potentially alleviate some of the design complexity rather than trying to create perfect protocols for how all subclasses should work. For sets I would also add that creating a relatively complete design for a protocol is likely more feasible than Maps I would imagine.

bakkot · 2022-09-17T19:13:07Z

There has been a great deal of discussion on this topic, which expanded beyond the original question into the more general question of how should we operate on the receiver and the argument, both in this thread and in committee. The resolution that we came to was basically

subclasses are expected to override all of these methods if they want something other than "use the internal slot and produce a base Set"
- i.e., these methods read from this.[[SetData]] directly
- and there is no use of Symbol.species to produce the resulting value; it's always a base Set
arguments should be consumed with their public interface, not the internal slots, so you can build a thing which encapsulates a Set rather than extending it, and still pass that thing as an argument to these methods on the base Set type
- concretely, that means these methods all do eager lookups of .size, .has, and .keys on their argument, and then repeatedly invoke either the has method or the next method produced by calling keys.
all of these methods, even union, expect their argument to be a Set or something which looks like it; there's no support for iterables (instead you are expected to wrap an iterable in new Set(it)).

This achieves the constraint that it is possible to pass a duck-typed Set as an argument, which many people (including myself) held to be important, while providing no assistance for subclassing without overriding these all of these methods (and therefore entailing no extra complexity toward that end).

See further discussion of these and other decisions in details.md.

This still does not quite achieve the "eagerly convert, then do all computation, then return the result" goal @domenic argued for above, because that goal is impossible while still achieving the optimal time complexity - with methods like intersection, it's necessary to repeatedly invoke arg.has in some cases rather than iterating the entire argument. But it's as close as possible while still achieving both optimal time complexity and the goal of accepting duck-typed Sets: the only observable parts of the algorithms are in the consumption of the argument, rather than in the consumption of this or the creation of the final result, and there's never interleaving of calls to two unrelated observable functions

Also, on the original question, add and delete are now never invoked. Instead, the [[SetData]] list for the ultimate set is constructed directly. This is actually even less observable than invoking new Set would be, because the Set constructor looks up and invokes the current value of Set.prototype.add.

I'm closing this as resolved. Since this discussion has covered a lot of ground, if you'd like to revisit any of the points I've made in this comment, please open a new issue rather than continuing in this thread.

bakkot mentioned this issue Jan 16, 2019

Make new set construction logic internal #51

Closed

gsathya mentioned this issue Jan 17, 2019

Spec reviewer signoff #32

Closed

4 tasks

ljharb mentioned this issue Jan 30, 2019

Set.prototype.symmetricDifference spec has unexpected behavior when given iterable containing duplicate elements #56

Closed

Ginden added the rfc label Feb 1, 2019

Ginden pinned this issue Feb 12, 2022

Ginden mentioned this issue Feb 12, 2022

What are the actual algorithms we should use? #70

Closed

bakkot closed this as completed Sep 17, 2022

bakkot unpinned this issue Sep 17, 2022

Avoid "delete" and "add"? #50

Avoid "delete" and "add"? #50

Comments

ljharb commented Jan 15, 2019 • edited Loading

Produce a new set

"Addable"

"Queryable"

"Deletable"

Generic receiver

bakkot commented Jan 15, 2019

gsathya commented Jan 15, 2019

ljharb commented Jan 15, 2019

bakkot commented Jan 15, 2019

ljharb commented Jan 15, 2019

bakkot commented Jan 15, 2019

bakkot commented Jan 15, 2019

gsathya commented Jan 15, 2019

bakkot commented Jan 15, 2019

bakkot commented Jan 15, 2019

gsathya commented Jan 15, 2019

bakkot commented Jan 15, 2019

Ginden commented Jan 16, 2019

bakkot commented Jan 16, 2019

domenic commented Jan 16, 2019 • edited Loading

gsathya commented Jan 17, 2019 • edited Loading

domenic commented Jan 17, 2019

gsathya commented Jan 17, 2019

zenparsing commented Jan 17, 2019

bakkot commented Jan 17, 2019

zenparsing commented Jan 17, 2019

domenic commented Jan 17, 2019 • edited Loading

bakkot commented Jan 17, 2019

domenic commented Jan 17, 2019

bakkot commented Jan 17, 2019

domenic commented Jan 17, 2019

gsathya commented Jan 17, 2019

bakkot commented Jan 17, 2019 • edited Loading

bakkot commented Jan 17, 2019

littledan commented Jan 17, 2019

bakkot commented Jan 17, 2019

littledan commented Jan 17, 2019

ljharb commented Jan 17, 2019

bakkot commented Jan 17, 2019

ljharb commented Jan 17, 2019

bakkot commented Jan 17, 2019

ljharb commented Jan 17, 2019

bakkot commented Jan 17, 2019

ljharb commented Jan 18, 2019

bakkot commented Jan 18, 2019

gsathya commented Jan 18, 2019

bakkot commented Jan 18, 2019

ljharb commented Jan 18, 2019

bakkot commented Jan 18, 2019

ljharb commented Jan 18, 2019

bakkot commented Jan 18, 2019 • edited Loading

littledan commented Jan 18, 2019

bakkot commented Jan 18, 2019

ajklein commented Jan 18, 2019

Ginden commented Jan 19, 2019

bmeck commented Jul 26, 2019 • edited Loading

bakkot commented Sep 17, 2022

ljharb commented Jan 15, 2019 •

edited

Loading

domenic commented Jan 16, 2019 •

edited

Loading

gsathya commented Jan 17, 2019 •

edited

Loading

domenic commented Jan 17, 2019 •

edited

Loading

bakkot commented Jan 17, 2019 •

edited

Loading

bakkot commented Jan 18, 2019 •

edited

Loading

bmeck commented Jul 26, 2019 •

edited

Loading