-
Notifications
You must be signed in to change notification settings - Fork 15
Avoid "delete" and "add"? #50
Comments
Note that the Set constructor already creates a contract around having Also, I think it'd be helpful to this issue to split out whether the contract applies to the receiver, to instances of the receiver's species constructor, or to the argument. Contracts which apply only to the receiver or its species constructor seem like a different sort of thing than contracts which apply to arguments. |
IMO, there's only one protocol added by this proposal and that's the has-protocol, largely for isSubsetOf (but also intersection) that works directly on the argument/receiver. The rest of these methods work on an object constructed using the SpeciesConstructor, so I wouldn't necessarily classify these as new protocols. I expect these to be subclasses of Set which hold these methods.
The biggest problem with this is how would this work with subclassing? Also, given the above about how these aren't necessarily new protocols, I'm not sure if there's a lot of value in trying to do this. |
Since the entire list would be passed into the subclass's constructor, then the subclass would handle it in its constructor (which indeed calls "delete" seems to be a new protocol, though. |
@ljharb To be clear, you are proposing that
? And similar changes for |
I think "filter out duplicates per SameValueZero" can be handled by the constructor, but yep, exactly that. I'm sure we could abstract most of those steps into an abstract op as well, a la AddEntriesFromIterable. |
That means that intersection is |
I'd love to do this --remove all lookups, operate on an internal list and create a Set finally from this. Three main reasons I didn't do this is: I don't feel too strongly about any of these but I bet the committee does.
Can you be more specific about the changes here?
That's an excellent observation. It'd be great to get rid of this call, definitely. |
To make sure we're all on the same page, I was assuming that there would still be lookups for
FWIW I personally wouldn't object: as long as it calls the species constructor with the list it's come up with, rather than manipulating internal slots directly, I think it's sufficiently consistent. No idea what the rest of the committee would think.
If those methods are defined purely in terms of
I think this can also be avoided if you're willing to go with a somewhat hybrid approach, as long as everyone is on board with the "has" thing (on receivers). Something like Set.prototype.intersection = function (iterable) {
let Ctr = SpeciesConstructor(this, Set);
let bothItems = []; // assume this is a spec list
for (let element of iterable) {
if (this.has(element)) {
bothItems.push(element);
}
}
bothItems = FilterSameValueZeroDuplicates(bothItems);
return new Ctr(bothItems);
}
I am envisioning something like Set.prototype.difference = function (iterable) {
let Ctr = SpeciesConstructor(this, Set);
let newSet = new Ctr;
let otherSet = iterable;
let hasCheck = otherSet.has;
if (typeof hasCheck !== 'function') {
otherSet = new Set(iterable);
hasCheck = otherSet.has;
if (typeof hasCheck !== 'function') throw new TypeError;
}
// assuming we don't take the approach suggested in this issue
// if we did, presumably we'd build up the list of elements internally, then call Ctr with it
for (let element of this) {
if (!hasCheck.call(otherSet, element)) {
newSet.add(element);
}
}
return newSet;
} |
I can make a PR so we can all see the diff, if you want, though I can't guarantee I'd get to it for a day or two. |
ACK. This matches what I was thinking as well.
LGTM
Looks like we have two allocations here.. that seems unfortunate. Although I can probably optimize that away for the common cases.
Sounds great. Thanks! |
Do you mean in the |
Only Other methods enforce contracts on either |
Made a PR: #51. |
Hey folks, watching this from the sidelines and I continue to be struck by the over-complexity that we are injecting into the spec, in the name of genericism, subclassability, and the desire to touch the public API as many times as possible in order to give other classes (or other thisArgs) a chance to insert their own behaviors. As someone who did the same thing with promises in ES2015, and lived to regret it greatly, it's heartbreaking to see those same patterns propagated further. I'd hoped we'd do things differently going forward. As such, I want to present some alternatives that I've written up. I've done only the The proposed spec text is at https://gist.github.com/domenic/44773a7610dc8541bf2a83df2a3ce990. The guiding principles, which I think are fairly generally applicable, are:
I'm unsure how others feel about this, but there has been a lot of discussion, so I thought it couldn't hurt to try to put something out there that IMO is a lot better as a way forward. |
The over-complexity is exactly the problem I'm struggling with too. I'm worried that we're trying to make this work for everyone's use case, turning this into a bit of a kitchen sink.
I like this approach a lot, makes everything way simpler. It's straightforward to implement and optimize, but cripples subclassing... which is totally fine by me. I'm slightly worried about this turning into a discussion about subclassing in general (and not about set methods), but I'd like to hear more from others about this approach. @littledan @zenparsing WDYT? |
I think "cripples subclassing" is overstating it a bit. It's important to separate subclassing and the desire to touch public APIs. Subclassing still works fine in general. The only thing that might be slightly surprising is that you get back It's the pattern of going through as many public APIs as possible, in order to allow arguments or thisArgs to customize the algorithm's behavior, that the approach specifically avoids. Instead it treats the [[SetData]] as the source of truth---but any well-behaved subclass (i.e. one that calls super constructor and super methods) will work fine with that constraint. |
Interesting distinction. In my mind, the reason to use the public APIs is to allow subclasses to do this exact sort of customization. But I think your point is that subclassing is more about "extending the behavior" rather than "replacing the behavior", in which case you're right, subclassing isn't affected as much. |
I agree that Leaving aside |
I don't think "customize the algorithm's behavior" is the right way to think of this. The point of I don't care altogether that much care about |
I'm probably missing some context, but here are some late night thoughts: Currently, the extinsibility point for Set is the "add" method. It provides subclasses with the ability to map elements, filter the range of acceptable elements, etc. If we are going to support subclassing, then my first inclination would be to retain "add" as the only extensibility point, and use For the operations that generate new sets, I would probably start by seeing if I could create a new, empty instance with the species constructor and then directly use "add" to build it up. I would try to avoid creating an array just to pass it into the species constructor. I agree with @domenic insofar as I don't think we should feel compelled to make these algorithms generic over the receiver. We should be able to use |
Sure, if you want to accept iterators, you should go through Symbol.iterator, as my second example does. My contention though is that if you want to accept |
If what you want to do with a For example, here is a perfectly reasonable Set subclass; it mostly just orders things differently. Why should We already have a protocol for iterating things. If the only thing an algorithm is doing with a value is iterating over its contents, it should use this protocol. Even if there is an internal slot it could use instead. |
I guess I don't have much to say other than that I disagree. When operating on a Set, [[SetData]] is a better source of truth in my opinion than the monkey-patchable public API. Subclasses which don't keep an accurate [[SetData]] are not, in my opinion, "perfectly reasonable". |
I guess I assume that subclasses should not have to care (or think about) about the contents of internal slots, as long as the public API does the right thing. That seems almost tautological. That is what "internal" means. |
What we're discussing here is precisely what the public API will do. So I think with either design, subclasses that care about making the public API do the right thing will be well-behaved (another tautology). |
Why don't the existing Set methods use the iteration protocol? If this isn't a problem for existing methods, why is this a problem now? |
If But also, the public API already exists. My example above is currently well behaved. Adding new methods to Set.prototype should not break that unless there's good reason to. And "we want to route around existing protocols for answering the specific question you have" is not a good reason.
None of them take a set as an argument, only a receiver, so a subclass can override their behavior. (But also, they should, and I would very much like to change that.) |
That's... kind of an inside-out way of looking at it. What I would say is that it is crucial that a
The Set subclass I gave above would behave inconsistently when passed to the hypothetical |
I want to second the voices here arguing for conversion or checks "at the boundary" rather than many observable points throughout the algorithm. Talking about "subclassable builtins" can be confusing: The first part of subclassability is I like the idea of object oriented design, and using method dispatch where it makes sense, but I think our default should be simple, predictable algorithms, and we should call into observable points where we have a particular use case for them. Although ES6 took a strong, opinionated stance towards using these points when it was possible to conceive of them, I think our experience shows that we should be more driven by use cases going forward, for deciding when to add this complexity. I'm not sure if "performance" or "implementation complexity" is a strong argument for this group, but it's something that comes to mind for me when I think about this issue: I did part of the work on the implementation of ES6 RegExp and Array subclassing in V8, and I deeply regret it. To avoid performance regressions while adding these observable points, we ended up adding "performance cliffs": The algorithm was something like, check whether anyone is observing it, if not take the fast path, otherwise take a new, slow path. Since my work there (which I'm not proud of at all), my colleagues in the V8 team have gone back and sped up many of those slow paths, but the performance difference remains, and an incredible amount of engineering effort went into this project. Meanwhile, years later, it's still not clear what kind of software engineering benefit those changes brought to programmers. I don't get the impression that lots of people are deliberately subclassing At the same time, if someone were to implement RegExp subclassing as a JavaScript library, rather than a revision of the built-in behavior, it would be significantly less source code (we're talking maybe 50-100 lines of JS), and they could invoke it where it's useful for them. It has never been clear to me why JavaScript needs a built-in framework for RegExp-like things, and it's not clear to me why we need such a framework for Set-like things. |
I think that iterating things using the iteration protocol is simpler and more predictable than reaching into internal slots. In general I think routing around existing, well-known points of extensibility is always going to be confusing.
I agree that we should not have decided that (Incidentally, I've subclassed |
This is really interesting. I'd like to hear more about your use cases for subclassing |
@littledan I gave one example above, derived from something I actually did (which was ordering things in a slightly different way). I would expect passing instances of this to Also, if I have multiple such subclasses coming from different places, I expect them to play nice with each other - for example, I would expect As another example, I've made one-off implementations of MultiMap a bunch of times, though I don't think I've ever had occasion for MultiSet. I have a Set which updates its iteration order when queried with I'm sure there's others I'm forgetting. There's a fairly common theme, though: these things customize (These methods also would break when a subclass instance was used as their receiver, which means my subclasses would be incoherent until I updated them, but that's something I could eventually patch over by overriding those methods as well. I don't see any reason why they should break, but it's fixable.) @ljharb The only ones I can recall for which |
why wouldn't rekey allow distinguishing -0 and 0? could you not do something like this? function rekey(key) {
if (typeof key === 'string') {
return `string-${key}`;
}
if (typeof key === number && key === 0) {
return `zero-${Object.is(key, -0) ? 'negative' : 'positive'}`;
}
return key;
} |
Ah, I didn't realize |
Assuming that it maps back on every operation into [[SetData]], it seems to me like it would reduce the number of needed extension points to virtually just the constructor, and possibly |
I don't know that using |
what i mean is, when rekey lands, perhaps this proposal could only use SetData (and the iteration protocol, as needed), since most of the reasons to override, say, |
I don't think all of my use cases for overriding |
Sorry for being pedantic but I don't want this to be misconstrued by anyone reading your comment without the whole context (which can often happen),
If your definition of subclassing means "replaces a core set of methods and expect everything else to just work". As others have chimed in here, that's not the obvious definition for a lot of folks and there are several other use cases of subclassing that continue to work fine (for ex, when you want to extend a Set). |
Sorry, yes, there are still certain kinds of subclass which work. It just means that
It's not just the core set of methods. The proposal is that there should be no method which could be replaced which would make The existence of methods which read [[SetData]] off their argument without passing through any user-hookable method would mean that replacing all of the methods would not be sufficient. Many kinds of |
Isn't that how the |
The |
ah, and in the event that something read the size, you’d want it to use your overridden getter instead of “the number of elements in [[SetData]]”, for example |
Yes. At least in the cases where that something was reading size of an argument; I would prefer it read the size of its receiver in the same way, so that subclasses would not be forced to override it, but as long as subclasses could override it that would be possible to work around. |
I don't think this change means |
@littledan, again, it's not just that it would not get automatically work with the new feature, it's that it could not be made to work with it (in a way consistent with its other behavior). |
While I find myself swayed by arguments on both sides of this discussion, I agree with @bakkot that the |
I don't know what to think yet on most of issues raised in this thread, but I consider these important:
|
I'm interested by the points brought up here. I was asked to look into this issue after talking about a I find that subclassing can introduce some issues at least when looking at features that are conceptually able to be mixed but having a single inheritance chain makes them complex. For example if we have 2 classes: // a set that only adds adults
class AdultsSet extends Set {
add(person) {
if (person.age < LEGAL_ADULT_AGE) { return; }
else super.add(person);
}
}
// a set that only adds people within a specific geographic region
class GeographicSet extends Set {
#region;
constructor(region) {
this.#region = region;
}
add(person) {
if (person.region !== this.#region) { return; }
else super.add(person);
}
} It is plausible to want to mix the features of these 2 Sets to create a 3rd type that only has adults within a region as elements. I do not think subclassing aids in mixing these types to create a 3rd type, but in fact can hinder it. In particular the specialization of these subclasses do not relate to having a change to expected behaviors and thus seem more likely than other types to wish to be mixed. However, in other scenarios like a multimap, fundamental parts of expectations can be confused. A key may still be in the collection even after a successful delete, etc. This does seem like a good place for subclassing because it doesn't adhere to some expectations, but wishes to use the same API surface for various reason. Even with these fundamental differences in expectations you could imagine something like having I am concerned with introducing these subclass design protocols in light of having examples that fundamentally seem to invalidate parts of protocol expectations. I worry that any design constraints expected by the subclassing will either not be enforceable or actively prevent interesting use cases. In addition, with the exposure of subclassing design protocols we do not have a clean way of mixing disparate inheritance chains. In the past some ways to ease this problem such as mixins have been proposed, but that seem to further complicate the story of how to properly design subclasses. On the other side, by using internal fields directly we do not allow the same easy subclassing story in isolation for things like I wonder if we could try and maximize designs such that simple subclasses that do not have complex protocol needs or fundamentally different aspects than their parent class can be solved without subclassing. This would put the burden of combining specialization up at the top level Base class for most cases, but would still allow complex subclasses to create necessary specializations. I think expanding the base class could potentially alleviate some of the design complexity rather than trying to create perfect protocols for how all subclasses should work. For sets I would also add that creating a relatively complete design for a protocol is likely more feasible than Maps I would imagine. |
There has been a great deal of discussion on this topic, which expanded beyond the original question into the more general question of how should we operate on the receiver and the argument, both in this thread and in committee. The resolution that we came to was basically
This achieves the constraint that it is possible to pass a duck-typed Set as an argument, which many people (including myself) held to be important, while providing no assistance for subclassing without overriding these all of these methods (and therefore entailing no extra complexity toward that end). See further discussion of these and other decisions in details.md. This still does not quite achieve the "eagerly convert, then do all computation, then return the result" goal @domenic argued for above, because that goal is impossible while still achieving the optimal time complexity - with methods like Also, on the original question, I'm closing this as resolved. Since this discussion has covered a lot of ground, if you'd like to revisit any of the points I've made in this comment, please open a new issue rather than continuing in this thread. |
After the latest round of changes, I wanted to start fresh and summarize my updated thought process:
Produce a new set
These species-construct a new set from the receiver.
"Addable"
These create a contract around having an "add" method.
"Queryable"
These create a contract around having a "has" method.
"Deletable"
These create a contract around having a "delete" method.
Generic receiver
These work with a non-Set iterable receiver:
It seems like it'd be nicer to have only one string-based protocol added by this proposal instead of three. Is there any way we could build up an iterable of "items to add" in all addable/deleteable cases, and construct the new set with that, rather than needing the "add" or "delete" method?
The text was updated successfully, but these errors were encountered: