Skip to content
This repository has been archived by the owner on Jul 8, 2024. It is now read-only.

Avoid "delete" and "add"? #50

Closed
ljharb opened this issue Jan 15, 2019 · 75 comments
Closed

Avoid "delete" and "add"? #50

ljharb opened this issue Jan 15, 2019 · 75 comments
Labels

Comments

@ljharb
Copy link
Member

ljharb commented Jan 15, 2019

After the latest round of changes, I wanted to start fresh and summarize my updated thought process:

Produce a new set

These species-construct a new set from the receiver.

  • union
  • intersection
  • difference
  • symmetricDifference

"Addable"

These create a contract around having an "add" method.

  • union
  • intersection
  • symmetricDifference

"Queryable"

These create a contract around having a "has" method.

  • intersection
  • isSubsetOf
  • isSupersetOf
  • isDisjointFrom

"Deletable"

These create a contract around having a "delete" method.

  • difference
  • symmetricDifference

Generic receiver

These work with a non-Set iterable receiver:

  • isSubsetOf
  • isSupersetOf
  • isDisjointFrom

It seems like it'd be nicer to have only one string-based protocol added by this proposal instead of three. Is there any way we could build up an iterable of "items to add" in all addable/deleteable cases, and construct the new set with that, rather than needing the "add" or "delete" method?

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

Note that the Set constructor already creates a contract around having .add, though it's hard to observe.

Also, I think it'd be helpful to this issue to split out whether the contract applies to the receiver, to instances of the receiver's species constructor, or to the argument. Contracts which apply only to the receiver or its species constructor seem like a different sort of thing than contracts which apply to arguments.

@gsathya
Copy link
Member

gsathya commented Jan 15, 2019

It seems like it'd be nicer to have only one string-based protocol added by this proposal instead of three.

IMO, there's only one protocol added by this proposal and that's the has-protocol, largely for isSubsetOf (but also intersection) that works directly on the argument/receiver. The rest of these methods work on an object constructed using the SpeciesConstructor, so I wouldn't necessarily classify these as new protocols. I expect these to be subclasses of Set which hold these methods.

Is there any way we could build up an iterable of "items to add" in all addable/deleteable cases, and construct the new set with that, rather than needing the "add" or "delete" method?

The biggest problem with this is how would this work with subclassing? Also, given the above about how these aren't necessarily new protocols, I'm not sure if there's a lot of value in trying to do this.

@ljharb
Copy link
Member Author

ljharb commented Jan 15, 2019

Since the entire list would be passed into the subclass's constructor, then the subclass would handle it in its constructor (which indeed calls add, as @bakkot pointed out).

"delete" seems to be a new protocol, though.

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

@ljharb To be clear, you are proposing that Set.prototype.intersection would look something like

// assume the existence of a CreateListFromIterable abstract operation

1. Let _set_ be the *this* value.
1. If Type(_set_) is not Object, throw a *TypeError* exception.
1. Let _Ctr_ be ? SpeciesConstructor(_set_, %Set%).
1. Let _thisIteratorRecord_ be ? GetIterator(_set_).
1. Let _otherIteratorRecord_ be ? GetIterator(_iterable_).
1. Let _thisItems_ be ? CreateListFromIterable(_thisIteratorRecord_).
1. Let _otherItems_ be ? CreateListFromIterable(_otherIteratorRecord_).
1. Let _bothItems_ be a new empty List.
1. For each element _thisItem_ of _thisItems_, do
    1. For each element _otherItem_ of _otherItems_, do
        1. If SameValueZero(_thisItem_, _otherItem_) is *true*, add _thisItem_ to _bothItems_.
1. ** Possibly some sort of "filter out duplicates per SameValueZero" step here
1. Let _array_ be CreateArrayFromList(_bothItems_).
1. Return ? Construct(_Ctr_, _array_).

? And similar changes for union, intersection, difference, and symmetricDifference, but no changes to isSubsetOf, isSupersetOf, or isDisjointFrom?

@ljharb
Copy link
Member Author

ljharb commented Jan 15, 2019

I think "filter out duplicates per SameValueZero" can be handled by the constructor, but yep, exactly that. I'm sure we could abstract most of those steps into an abstract op as well, a la AddEntriesFromIterable.

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

That means that intersection is O(len(this) + len(other)), instead of O(len(other)). That seems unfortunate.

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

Though I guess if that's a major concern, difference ought to treat its argument the same way isSubsetOf does. (Actually, maybe it should anyway?)

@gsathya, might it be worth doing this so as to avoid having to guard Set.prototype.delete? Not necessarily advocating, just asking.

@gsathya
Copy link
Member

gsathya commented Jan 15, 2019

? And similar changes for union, intersection, difference, and symmetricDifference, but no changes to isSubsetOf, isSupersetOf, or isDisjointFrom?

I'd love to do this --remove all lookups, operate on an internal list and create a Set finally from this. Three main reasons I didn't do this is:
(a) Consistency: we'd be moving away from the pattern used by the Set constructor (which looks up the add method). I wasn't sure if the committee would be happy with this.
(b) Subclassing becomes less useful if internal operations are going to bypass overridden methods
(c) The runtime cost that @bakkot described (... although this isn't going to happen for almost all common cases)

I don't feel too strongly about any of these but I bet the committee does.

Though I guess if that's a major concern, difference ought to treat its argument the same way isSubsetOf does. (Actually, maybe it should anyway?)

Can you be more specific about the changes here?

@gsathya, might it be worth doing this so as to avoid having to guard Set.prototype.delete? Not necessarily advocating, just asking.

That's an excellent observation. It'd be great to get rid of this call, definitely.

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

remove all lookups, operate on an internal list and create a Set

To make sure we're all on the same page, I was assuming that there would still be lookups for Symbol.iterator and 'has', and a call to the species constructor, which would itself call add (typically). But the actual logic would be internal operations on spec Lists, yes, and would thereby avoid calls to delete and add (except indirectly by calling the constructor).

Consistency: we'd be moving away from the pattern used by the Set constructor (which looks up the add method). I wasn't sure if the committee would be happy with this.

FWIW I personally wouldn't object: as long as it calls the species constructor with the list it's come up with, rather than manipulating internal slots directly, I think it's sufficiently consistent. No idea what the rest of the committee would think.

Subclassing becomes less useful if internal operations are going to bypass overridden methods

If those methods are defined purely in terms of Symbol.iterator, 'has', and calls to the species constructor, I would expect that to be OK: a well-behaved subclass will need to make those work consistently too. Again, no idea what the rest of the committee would think.

The runtime cost that @bakkot described

I think this can also be avoided if you're willing to go with a somewhat hybrid approach, as long as everyone is on board with the "has" thing (on receivers). Something like

Set.prototype.intersection = function (iterable) {
  let Ctr = SpeciesConstructor(this, Set);
  let bothItems = []; // assume this is a spec list
  for (let element of iterable) {
    if (this.has(element)) {
      bothItems.push(element);
    }
  }
  bothItems = FilterSameValueZeroDuplicates(bothItems);
  return new Ctr(bothItems);
}

Can you be more specific about the changes here?

I am envisioning something like

Set.prototype.difference = function (iterable) {
  let Ctr = SpeciesConstructor(this, Set);
  let newSet = new Ctr;

  let otherSet = iterable;
  let hasCheck = otherSet.has;
  if (typeof hasCheck !== 'function') {
    otherSet = new Set(iterable);
    hasCheck = otherSet.has;
    if (typeof hasCheck !== 'function') throw new TypeError;
  }

  // assuming we don't take the approach suggested in this issue
  // if we did, presumably we'd build up the list of elements internally, then call Ctr with it

  for (let element of this) {
    if (!hasCheck.call(otherSet, element)) {
      newSet.add(element);
    }
  }
  return newSet;
}

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

I can make a PR so we can all see the diff, if you want, though I can't guarantee I'd get to it for a day or two.

@gsathya
Copy link
Member

gsathya commented Jan 15, 2019

To make sure we're all on the same page, I was assuming that there would still be lookups for Symbol.iterator and 'has', and a call to the species constructor, which would itself call add (typically). But the actual logic would be internal operations on spec Lists, yes, and would thereby avoid calls to delete and add (except indirectly by calling the constructor).

ACK. This matches what I was thinking as well.

I think this can also be avoided if you're willing to go with a somewhat hybrid approach, as long as everyone is on board with the "has" thing (on receivers). Something like

LGTM

I am envisioning something like

Looks like we have two allocations here.. that seems unfortunate. Although I can probably optimize that away for the common cases.

I can make a PR so we can all see the diff, if you want, though I can't guarantee I'd get to it for a day or two.

Sounds great. Thanks!

@bakkot
Copy link
Collaborator

bakkot commented Jan 15, 2019

Looks like we have two allocations here.. that seems unfortunate.

Do you mean in the typeof hasCheck !== 'function' branch, or something else? That's so, but matches the same branch for isSubsetOf in its current state, which causes that method to have an allocation where otherwise there would be none. (The approach taken in #43 would get rid of those allocations :D )

@Ginden
Copy link
Collaborator

Ginden commented Jan 16, 2019

Also, I think it'd be helpful to this issue to split out whether the contract applies to the receiver, to instances of the receiver's species constructor, or to the argument. Contracts which apply only to the receiver or its species constructor seem like a different sort of thing than contracts which apply to arguments.

Only isSubsetOf checks for presence of has method on it's argument. This can (and probably should) be avoided by always calling Construct(%Set%, iterable).

Other methods enforce contracts on either this (receiver) or result of Construct(SpeciesConstructor(this, %Set%))

@bakkot
Copy link
Collaborator

bakkot commented Jan 16, 2019

Made a PR: #51.

@domenic
Copy link
Member

domenic commented Jan 16, 2019

Hey folks, watching this from the sidelines and I continue to be struck by the over-complexity that we are injecting into the spec, in the name of genericism, subclassability, and the desire to touch the public API as many times as possible in order to give other classes (or other thisArgs) a chance to insert their own behaviors. As someone who did the same thing with promises in ES2015, and lived to regret it greatly, it's heartbreaking to see those same patterns propagated further. I'd hoped we'd do things differently going forward.

As such, I want to present some alternatives that I've written up. I've done only the difference() method, but you can apply the same principles to all of them.

The proposed spec text is at https://gist.github.com/domenic/44773a7610dc8541bf2a83df2a3ce990. The guiding principles, which I think are fairly generally applicable, are:

  • Convert from user-hookable objects (Set or iterable) to spec-internal data structures ASAP, "at the boundary".
  • Operate entirely on the spec-internal data structures for the actual algorithm you're performing.
  • Convert back to user-hookable objects right before returning to the caller, no sooner
  • Extract and emplace data out of/into Sets by using their internal slots as the source of truth
  • Hard-code your object creation to not be hookable
    • I'm unsure on this particular principle, as it just seems kind of nicer to use new this.constructor(). But hard-coding is way nicer than the gross species dance, and subclassing in general has not panned out well. The dream that "all the inherited methods will automatically do the right thing" that drives using new (this.constructor && this.constructor[Symbol.species] || Set)() has failed pretty badly, so I'd rather abandon it than keep it up.

I'm unsure how others feel about this, but there has been a lot of discussion, so I thought it couldn't hurt to try to put something out there that IMO is a lot better as a way forward.

@gsathya
Copy link
Member

gsathya commented Jan 17, 2019

Hey folks, watching this from the sidelines and I continue to be struck by the over-complexity that we are injecting into the spec, in the name of genericism, subclassability, and the desire to touch the public API as many times as possible in order to give other classes (or other thisArgs) a chance to insert their own behaviors

The over-complexity is exactly the problem I'm struggling with too. I'm worried that we're trying to make this work for everyone's use case, turning this into a bit of a kitchen sink.

The proposed spec text is at https://gist.github.com/domenic/44773a7610dc8541bf2a83df2a3ce990.

I like this approach a lot, makes everything way simpler. It's straightforward to implement and optimize, but cripples subclassing... which is totally fine by me.

I'm slightly worried about this turning into a discussion about subclassing in general (and not about set methods), but I'd like to hear more from others about this approach.

@littledan @zenparsing WDYT?

@domenic
Copy link
Member

domenic commented Jan 17, 2019

I think "cripples subclassing" is overstating it a bit. It's important to separate subclassing and the desire to touch public APIs.

Subclassing still works fine in general. The only thing that might be slightly surprising is that you get back Set instances instead of Subclass instances. But we could back off a bit and re-include new this.constructor() (or, if you must, new (this.constructor && this.constructor[Symbol.species] || Set)()).

It's the pattern of going through as many public APIs as possible, in order to allow arguments or thisArgs to customize the algorithm's behavior, that the approach specifically avoids. Instead it treats the [[SetData]] as the source of truth---but any well-behaved subclass (i.e. one that calls super constructor and super methods) will work fine with that constraint.

@gsathya
Copy link
Member

gsathya commented Jan 17, 2019

It's important to separate subclassing and the desire to touch public APIs.

Interesting distinction. In my mind, the reason to use the public APIs is to allow subclasses to do this exact sort of customization. But I think your point is that subclassing is more about "extending the behavior" rather than "replacing the behavior", in which case you're right, subclassing isn't affected as much.

@zenparsing
Copy link
Member

I agree that Symbol.species leaves us with an... interesting legacy moving forward. On the other hand, having most of the current built-ins use species and some new ones not would be a different form of unfortunately legacy.

Leaving aside species, I agree with the general design goal of avoiding unecessary customizability "in the middle" of these algorithms. I need to think a little more about the semantics for each of these operations before I can comment usefully, though.

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

It's the pattern of going through as many public APIs as possible, in order to allow arguments or thisArgs to customize the algorithm's behavior

I don't think "customize the algorithm's behavior" is the right way to think of this. The point of Symbol.iterator is to allow consumers to iterate over the contents a container. Algorithms which want to iterate over the contents of a container should therefore use Symbol.iterator. Internal slots are precisely that - internal. Subclasses shouldn't have to worry about them as long as they maintain a consistent public API. (I very much do not agree that a well-behaved subclass must always call its super methods.)

I don't care altogether that much care about species, and I agree it's perfectly reasonable to make the internal logic unhookable (cf #51), but I don't think it's correct to think of the "get contents" / "query membership" operations on the receiver or argument as being internal.

@zenparsing
Copy link
Member

I'm probably missing some context, but here are some late night thoughts:

Currently, the extinsibility point for Set is the "add" method. It provides subclasses with the ability to map elements, filter the range of acceptable elements, etc. If we are going to support subclassing, then my first inclination would be to retain "add" as the only extensibility point, and use [[SetData]] for everything else.

For the operations that generate new sets, I would probably start by seeing if I could create a new, empty instance with the species constructor and then directly use "add" to build it up. I would try to avoid creating an array just to pass it into the species constructor.

I agree with @domenic insofar as I don't think we should feel compelled to make these algorithms generic over the receiver. We should be able to use [[SetData]] and not have to go through the ceremony of converting thisArg.[[SetData]] into an iterator.

@domenic
Copy link
Member

domenic commented Jan 17, 2019

Algorithms which want to iterate over the contents of a container should therefore use Symbol.iterator.

Sure, if you want to accept iterators, you should go through Symbol.iterator, as my second example does. My contention though is that if you want to accept Sets (as either thisArgs or arguments), you should go through [[SetData]] as the source of truth. You should not go through has()/delete()/add().

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

If what you want to do with a Set is to iterate it, you should go through Symbol.iterator. Treating [[SetData]] as the source of truth for how to iterate means breaking subclassing.

For example, here is a perfectly reasonable Set subclass; it mostly just orders things differently. Why should Set.prototype.union be forced to give an answer inconsistent with the other methods when one of these is its receiver or argument? If it goes through Symbol.iterator, it will automatically be consistent.

We already have a protocol for iterating things. If the only thing an algorithm is doing with a value is iterating over its contents, it should use this protocol. Even if there is an internal slot it could use instead.

@domenic
Copy link
Member

domenic commented Jan 17, 2019

I guess I don't have much to say other than that I disagree. When operating on a Set, [[SetData]] is a better source of truth in my opinion than the monkey-patchable public API. Subclasses which don't keep an accurate [[SetData]] are not, in my opinion, "perfectly reasonable".

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

I guess I assume that subclasses should not have to care (or think about) about the contents of internal slots, as long as the public API does the right thing. That seems almost tautological. That is what "internal" means.

@domenic
Copy link
Member

domenic commented Jan 17, 2019

What we're discussing here is precisely what the public API will do. So I think with either design, subclasses that care about making the public API do the right thing will be well-behaved (another tautology).

@gsathya gsathya mentioned this issue Jan 17, 2019
4 tasks
@gsathya
Copy link
Member

gsathya commented Jan 17, 2019

We already have a protocol for iterating things. If the only thing an algorithm is doing with a value is iterating over its contents, it should use this protocol. Even if there is an internal slot it could use instead.

Why don't the existing Set methods use the iteration protocol? If this isn't a problem for existing methods, why is this a problem now?

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

If x is an instance of a subclass, and y is not, x cannot make y.union(x) do the right thing unless Set.prototype.union use the iteration protocol rather than [[SetData]].

But also, the public API already exists. My example above is currently well behaved. Adding new methods to Set.prototype should not break that unless there's good reason to. And "we want to route around existing protocols for answering the specific question you have" is not a good reason.

Why don't the existing Set methods use the iteration protocol? If this isn't a problem for existing methods, why is this a problem now?

None of them take a set as an argument, only a receiver, so a subclass can override their behavior.

(But also, they should, and I would very much like to change that.)

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

it's crucial that, given an arbitrary Set that you don't control (whose methods you don't have the ability to edit), that an argument passed to that Set's methods be able to customize what those methods do

That's... kind of an inside-out way of looking at it. What I would say is that it is crucial that a Set subclass be able to define how to add to, query membership in, and iterate that subclass. Since it happens that "is subset of" is defined only in terms of those operations, yes, that means the behavior of isSubsetOf should be affected in a particular way by those customizations, in exactly the same way as any other consumer would be.

a Set subclass which lacks the capability to exert this power on methods that receive instances of the subclass is incorrect

The Set subclass I gave above would behave inconsistently when passed to the hypothetical isSubsetOf. It is currently a correct subclass, in the sense that it behaves consistently when passed to anything that accepts Set subclasses - certainly when passed to a userland implementation of isSubsetOf. I think this is a fairly universal definition of "correct subclass". If we added an isSubsetOf which queried membership by reading [[SetData]], it would not be and could not be.

@littledan
Copy link
Member

I want to second the voices here arguing for conversion or checks "at the boundary" rather than many observable points throughout the algorithm.

Talking about "subclassable builtins" can be confusing: The first part of subclassability is new.target, and we're good on all that (even for web platform objects, even though WebIDL doesn't currently specify it!). I'm genuinely happy about that change; it allows simple subclassing that adds methods, fields, etc. And the second part is adding in these extra points to modify behavior. I think calling into dynamically dispatched methods can be good sometimes, but I'm still not convinced that it should be our default.

I like the idea of object oriented design, and using method dispatch where it makes sense, but I think our default should be simple, predictable algorithms, and we should call into observable points where we have a particular use case for them. Although ES6 took a strong, opinionated stance towards using these points when it was possible to conceive of them, I think our experience shows that we should be more driven by use cases going forward, for deciding when to add this complexity.

I'm not sure if "performance" or "implementation complexity" is a strong argument for this group, but it's something that comes to mind for me when I think about this issue:

I did part of the work on the implementation of ES6 RegExp and Array subclassing in V8, and I deeply regret it. To avoid performance regressions while adding these observable points, we ended up adding "performance cliffs": The algorithm was something like, check whether anyone is observing it, if not take the fast path, otherwise take a new, slow path. Since my work there (which I'm not proud of at all), my colleagues in the V8 team have gone back and sped up many of those slow paths, but the performance difference remains, and an incredible amount of engineering effort went into this project.

Meanwhile, years later, it's still not clear what kind of software engineering benefit those changes brought to programmers. I don't get the impression that lots of people are deliberately subclassing RegExp and overriding exec for memoization. And if they do, they will find their RegExp operations run more slowly.

At the same time, if someone were to implement RegExp subclassing as a JavaScript library, rather than a revision of the built-in behavior, it would be significantly less source code (we're talking maybe 50-100 lines of JS), and they could invoke it where it's useful for them. It has never been clear to me why JavaScript needs a built-in framework for RegExp-like things, and it's not clear to me why we need such a framework for Set-like things.

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

I think our default should be simple, predictable algorithms

I think that iterating things using the iteration protocol is simpler and more predictable than reaching into internal slots. In general I think routing around existing, well-known points of extensibility is always going to be confusing.

It has never been clear to me why JavaScript needs a built-in framework for RegExp-like things, and it's not clear to me why we need such a framework for Set-like things.

I agree that we should not have decided that RegExp ought to be subclassable. But if we want to decide that Set should not be subclassable, we should do that deliberately and thoroughly. We shouldn't just cripple subclassing piecemeal.

(Incidentally, I've subclassed Set (and Map) a fair bit in my own code; I don't think it's that unusual. Set-like things are pretty common and impossible to implement completely in userland with the same algorithmic complexity. That is, after all, why we put them in the language.)

@littledan
Copy link
Member

Incidentally, I've subclassed Set (and Map) a fair bit in my own code; I don't think it's that unusual. Set-like things are pretty common and impossible to implement completely in userland with the same algorithmic complexity. That is, after all, why we put them in the language.

This is really interesting. I'd like to hear more about your use cases for subclassing Set. Thinking about your code, how might it take advantage of the particular hook-ability we're talking about here? Would it come up? Subclassing Set in the way that it's supported today is more about new.target handling, which I think we all agree on.

@ljharb
Copy link
Member Author

ljharb commented Jan 17, 2019

@bakkot how many of your use cases of subclassing Set could be handled by calling super with a rekey function?

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

@littledan I gave one example above, derived from something I actually did (which was ordering things in a slightly different way). I would expect passing instances of this to union or isSubsetOf or whatever would work, even when the receiver of those methods is not an instance of the subclass - after all, these instances work correctly everywhere else a Set does. That requires those methods to use the same mechanisms to interface with the subclass that any other code would.

Also, if I have multiple such subclasses coming from different places, I expect them to play nice with each other - for example, I would expect a.isSubsetOf(b) to work even if a and b are both subclasses of Set which don't know about each other, as long as they are otherwise well-behaved and A hasn't done anything weird with isSubsetOf. This can of course be made to work - it just requires A to have overridden isSubsetOf to do the obvious thing. But it seems like the implementation on the base class should just do the obvious thing.

As another example, I've made one-off implementations of MultiMap a bunch of times, though I don't think I've ever had occasion for MultiSet.

I have a Set which updates its iteration order when queried with has, so that the most recent items are first. That one actually would work OK with the [[SetData]] access, except that passing it to the built-in isSubsetOf would not reorder its contents, unlike every other access in the world.

I'm sure there's others I'm forgetting. There's a fairly common theme, though: these things customize add, has, and/or iteration and expect that consumers relying on updating, querying, or iterating sets will see their customizations. If union or isSubsetOf or whatever read from the [[SetData]] slot on their argument directly, these things will not work as arguments to them.

(These methods also would break when a subclass instance was used as their receiver, which means my subclasses would be incoherent until I updated them, but that's something I could eventually patch over by overriding those methods as well. I don't see any reason why they should break, but it's fixable.)

@ljharb The only ones I can recall for which rekey would have been useful are those which throw if certain invariants on the key are violated. rekey won't allow custom iteration order, distinguishing -0 and 0, storing one element multiple times, etc.

@ljharb
Copy link
Member Author

ljharb commented Jan 17, 2019

why wouldn't rekey allow distinguishing -0 and 0? could you not do something like this?

function rekey(key) {
  if (typeof key === 'string') {
    return `string-${key}`;
  }
  if (typeof key === number && key === 0) {
    return `zero-${Object.is(key, -0) ? 'negative' : 'positive'}`;
  }
  return key;
}

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

Ah, I didn't realize rekey mapped back when iterating.

@ljharb
Copy link
Member Author

ljharb commented Jan 17, 2019

Assuming that it maps back on every operation into [[SetData]], it seems to me like it would reduce the number of needed extension points to virtually just the constructor, and possibly Symbol.iterator (altho i think iteration order would be tricky), which might make this proposal significantly less complicated while still allowing many use cases.

@bakkot
Copy link
Collaborator

bakkot commented Jan 17, 2019

I don't know that using rekey rather than supporting subclasses which overwrite has would actually make this proposal any less complicated? Those two seem very similar in their effects on this proposal.

@ljharb
Copy link
Member Author

ljharb commented Jan 18, 2019

what i mean is, when rekey lands, perhaps this proposal could only use SetData (and the iteration protocol, as needed), since most of the reasons to override, say, has, are subsumed by rekey?

@bakkot
Copy link
Collaborator

bakkot commented Jan 18, 2019

I don't think all of my use cases for overriding has are subsumed by rekey, but yes, we could do that. I would regard that as essentially a decision that Set should not be subclassed and my remaining use cases are explicitly not going to be supported, even though they work fine today.

@gsathya
Copy link
Member

gsathya commented Jan 18, 2019

Sorry for being pedantic but I don't want this to be misconstrued by anyone reading your comment without the whole context (which can often happen),

I would regard that as essentially a decision that Set should not be subclassed

If your definition of subclassing means "replaces a core set of methods and expect everything else to just work".

As others have chimed in here, that's not the obvious definition for a lot of folks and there are several other use cases of subclassing that continue to work fine (for ex, when you want to extend a Set).

@bakkot
Copy link
Collaborator

bakkot commented Jan 18, 2019

Sorry, yes, there are still certain kinds of subclass which work. It just means that has would become de-facto final. You can override it, if you want, but the standard library will simply ignore your replacement. And because there's no other place you can hook, your subclass will necessarily be inconsistent - the behavior of membership testing for your class will vary depending on whether the code which is doing the testing happens to be in the standard library.

If your definition of subclassing means "replaces a core set of methods and expect everything else to just work".

It's not just the core set of methods. The proposal is that there should be no method which could be replaced which would make baseSet.isSubsetOf(subclassInstance) consume subclassInstance in the way which all other code accepting a subclass of Set would consume it. I am fine with (not enthusiastic about, but fine with) replacing all the methods if that's what it takes to behave consistently. But if a class is subclassable, I expect that replacing all of the methods should be sufficient.

The existence of methods which read [[SetData]] off their argument without passing through any user-hookable method would mean that replacing all of the methods would not be sufficient. Many kinds of Set subclass which today behave consistently would not be able to behave consistently in that world.

@ljharb
Copy link
Member Author

ljharb commented Jan 18, 2019

Isn't that how the size getter already works?

@bakkot
Copy link
Collaborator

bakkot commented Jan 18, 2019

The size getter can be overridden.

@ljharb
Copy link
Member Author

ljharb commented Jan 18, 2019

ah, and in the event that something read the size, you’d want it to use your overridden getter instead of “the number of elements in [[SetData]]”, for example

@bakkot
Copy link
Collaborator

bakkot commented Jan 18, 2019

Yes. At least in the cases where that something was reading size of an argument; I would prefer it read the size of its receiver in the same way, so that subclasses would not be forced to override it, but as long as subclasses could override it that would be possible to work around.

@littledan
Copy link
Member

I don't think this change means has would be de-facto final. Your code sample would continue to work fine, even if it doesn't get a transparent upgrade to the new feature. The private methods proposal that we have been collaborating on gives exactly the tools for JavaScript to do the kind of things we are talking about doing in Set, because reliable, predictable behavior is genuinely useful in various circumstances.

@bakkot
Copy link
Collaborator

bakkot commented Jan 18, 2019

@littledan, again, it's not just that it would not get automatically work with the new feature, it's that it could not be made to work with it (in a way consistent with its other behavior).

@ajklein
Copy link

ajklein commented Jan 18, 2019

While I find myself swayed by arguments on both sides of this discussion, I agree with @bakkot that the [[SetData]]-centric approach laid out by @domenic is staking out a significant claim about the subclassibility of Set. I think that should be decided on its own merits, rather than happening merely as a consequence of some choice made in a spec algorithm.

@Ginden
Copy link
Collaborator

Ginden commented Jan 19, 2019

I don't know what to think yet on most of issues raised in this thread, but I consider these important:

  • All methods should use SpeciesConstructor to create new element.
  • All methods should allow any iterable argument, not only sets.

@bmeck
Copy link
Member

bmeck commented Jul 26, 2019

I'm interested by the points brought up here. I was asked to look into this issue after talking about a Map.prototype.updateOrInsert() method. I think looking into categorizing use cases can be enlightening when discussing if something should delegate to subclass property lookup. I haven't thought as much about specialized Sets; so, some of the following will be focused on Maps, apologies.

I find that subclassing can introduce some issues at least when looking at features that are conceptually able to be mixed but having a single inheritance chain makes them complex. For example if we have 2 classes:

// a set that only adds adults
class AdultsSet extends Set {
  add(person) {
    if (person.age < LEGAL_ADULT_AGE) { return; }
    else super.add(person);
  }
}
// a set that only adds people within a specific geographic region
class GeographicSet extends Set {
  #region;
  constructor(region) {
    this.#region = region;
  }
  add(person) {
    if (person.region !== this.#region) { return; }
    else super.add(person);
  }
}

It is plausible to want to mix the features of these 2 Sets to create a 3rd type that only has adults within a region as elements. I do not think subclassing aids in mixing these types to create a 3rd type, but in fact can hinder it. In particular the specialization of these subclasses do not relate to having a change to expected behaviors and thus seem more likely than other types to wish to be mixed.

However, in other scenarios like a multimap, fundamental parts of expectations can be confused. A key may still be in the collection even after a successful delete, etc. This does seem like a good place for subclassing because it doesn't adhere to some expectations, but wishes to use the same API surface for various reason. Even with these fundamental differences in expectations you could imagine something like having AdultMultiMap that only allows adults. Given a difference in expectations for the MultiMap it seems like any specialization like limiting adults, would also need to take that into account.

I am concerned with introducing these subclass design protocols in light of having examples that fundamentally seem to invalidate parts of protocol expectations. I worry that any design constraints expected by the subclassing will either not be enforceable or actively prevent interesting use cases.

In addition, with the exposure of subclassing design protocols we do not have a clean way of mixing disparate inheritance chains. In the past some ways to ease this problem such as mixins have been proposed, but that seem to further complicate the story of how to properly design subclasses.


On the other side, by using internal fields directly we do not allow the same easy subclassing story in isolation for things like AdultSet above.

I wonder if we could try and maximize designs such that simple subclasses that do not have complex protocol needs or fundamentally different aspects than their parent class can be solved without subclassing. This would put the burden of combining specialization up at the top level Base class for most cases, but would still allow complex subclasses to create necessary specializations. I think expanding the base class could potentially alleviate some of the design complexity rather than trying to create perfect protocols for how all subclasses should work. For sets I would also add that creating a relatively complete design for a protocol is likely more feasible than Maps I would imagine.

@bakkot
Copy link
Collaborator

bakkot commented Sep 17, 2022

There has been a great deal of discussion on this topic, which expanded beyond the original question into the more general question of how should we operate on the receiver and the argument, both in this thread and in committee. The resolution that we came to was basically

  • subclasses are expected to override all of these methods if they want something other than "use the internal slot and produce a base Set"
    • i.e., these methods read from this.[[SetData]] directly
    • and there is no use of Symbol.species to produce the resulting value; it's always a base Set
  • arguments should be consumed with their public interface, not the internal slots, so you can build a thing which encapsulates a Set rather than extending it, and still pass that thing as an argument to these methods on the base Set type
    • concretely, that means these methods all do eager lookups of .size, .has, and .keys on their argument, and then repeatedly invoke either the has method or the next method produced by calling keys.
  • all of these methods, even union, expect their argument to be a Set or something which looks like it; there's no support for iterables (instead you are expected to wrap an iterable in new Set(it)).

This achieves the constraint that it is possible to pass a duck-typed Set as an argument, which many people (including myself) held to be important, while providing no assistance for subclassing without overriding these all of these methods (and therefore entailing no extra complexity toward that end).

See further discussion of these and other decisions in details.md.


This still does not quite achieve the "eagerly convert, then do all computation, then return the result" goal @domenic argued for above, because that goal is impossible while still achieving the optimal time complexity - with methods like intersection, it's necessary to repeatedly invoke arg.has in some cases rather than iterating the entire argument. But it's as close as possible while still achieving both optimal time complexity and the goal of accepting duck-typed Sets: the only observable parts of the algorithms are in the consumption of the argument, rather than in the consumption of this or the creation of the final result, and there's never interleaving of calls to two unrelated observable functions


Also, on the original question, add and delete are now never invoked. Instead, the [[SetData]] list for the ultimate set is constructed directly. This is actually even less observable than invoking new Set would be, because the Set constructor looks up and invokes the current value of Set.prototype.add.


I'm closing this as resolved. Since this discussion has covered a lot of ground, if you'd like to revisit any of the points I've made in this comment, please open a new issue rather than continuing in this thread.

@bakkot bakkot closed this as completed Sep 17, 2022
@bakkot bakkot unpinned this issue Sep 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants