Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete amount of exposed state on %SegmentIteratorPrototype% #54

Closed
gibson042 opened this issue Nov 5, 2018 · 10 comments
Closed

Incomplete amount of exposed state on %SegmentIteratorPrototype% #54

gibson042 opened this issue Nov 5, 2018 · 10 comments

Comments

@gibson042
Copy link
Collaborator

gibson042 commented Nov 5, 2018

It's a bit strange for each segmentIterator to expose some but not all of the data returned by its next method. I'd be in favor of dropping breakType, or—if keeping it serves some important purpose—replacing both it and position/index with an accessor that echoes the most recent object from next.

@littledan
Copy link
Member

littledan commented Nov 5, 2018

Could you say why (beyond that it's "strange")? I'd like to make that iterator return objects that are not invalidated by future next() calls.

@gibson042
Copy link
Collaborator Author

Well, none of the ECMAScript built-in iterators expose any state at all beyond a next method returning ephemeral results, and that pattern should only be broken with good cause. There's a rough analog in the form of lastIndex on RegExp instances (which predates ES iterators), but even that is limited to next start position and includes no information about the last match.

What makes this API so special that it demands auto-memoization of next results, and only covering a subset of their data at that?

@littledan
Copy link
Member

One reason is that we need preceding and following methods, see #9 . Another is performance concerns (I am having trouble finding that thread). Please, you can disagree, but don't assume that all of this is just in place by accident.

@gibson042
Copy link
Collaborator Author

gibson042 commented Nov 6, 2018

%SegmentIteratorPrototype%.breakType was added without explanation in 7f8b345 two years ago, and it's hard to debate unstated reasons. But I don't assume that it was accidental, and I'm sorry if i gave that impression... I'm just suggesting that behavior shouldn't diverge from analogous APIs like %StringIterator% without explicit justification.

I don't dispute that arbitrary-index preceding and following methods can be useful, although to be honest I'd have a hard time coming up with sufficient justification and would love to see a realistic application in the FAQ. But although even next requires internal position tracking, none of the three methods need or even benefit from exposing it on the iterator itself (as opposed to only in the iterator results), let alone taking the further step of memoizing breakType.

As for performance, I don't want to get into that guessing game but will observe that if bypassing object allocations were a strong concern (which I'd argue against anyway), then all the result properties should be mirrored as accessors on the iterator, rather than just two of them (in particular, segment itself is currently absent).

@littledan
Copy link
Member

If you want to avoid these kinds of impressions, you could start by asking why, rather than filing a bug claiming incorrectness.

Segment is just a convenience property, which you can calculate based on the string, the position before, and the position after. It is omitted to avoid that allocation.

PRs welcome to improve the documentation to summarize the result of #9.

@gibson042 gibson042 changed the title Incorrect amount of exposed state on %SegmentIteratorPrototype% Incomplete amount of exposed state on %SegmentIteratorPrototype% Nov 6, 2018
@gibson042
Copy link
Collaborator Author

If you want to avoid these kinds of impressions, you could start by asking why, rather than filing a bug claiming incorrectness.

Updated to replace "incorrect" with the more accurate "incomplete". But I try not to phrase issues as questions because the resolution shouldn't be an answer, it should be either an update to the explainer or an update to the spec text—and it's impossible to determine which without an issue to capture discussion. I'm happy to adapt to whatever patterns you prefer, though... where would you like to see such questions?

PRs welcome to improve the documentation to summarize the result of #9.

You keep asking me to submit PRs explaining decisions, but I can't do that for decisions that didn't come with reasons. #9 requested preceding and following, but there is no example code showing how those methods pay for theirselves in realistic situations. And this issue isn't even about that, it's mostly about %SegmentIteratorPrototype%.breakType (which sprang into existence with no GitHub discussion at all).

Segment is just a convenience property, which you can calculate based on the string, the position before, and the position after. It is omitted to avoid that allocation.

The iterator object has internal slots for position and break type, and everything else is derived from those—the iteration result currently has explicit segment, breakType, and position data properties, and (the topic of this issue) the iterator itself has position and breakType getters. Both of those accessors seem to be convenience properties, but no allocations take place until they are invoked (which seems to be moot anyway, since CreateIterResultObject itself necessitates allocations).

I'll open some PRs to clarify what I'm talking about.

@littledan
Copy link
Member

I'm not trying to put the burden on you to make PRs, though I'd really appreciate your help. If no one gets around to it, I hope to eventually come back and do it.

I don't actually understand what's incomplete about #9, or what kind of thing would make them "pay for themselves". What makes them expensive?

About breakType, the rationale is to enable segmentation, with the user checking the breakType (e.g., soft vs hard line break) without the overhead of the iteration protocol and also with the flexibility of preceding and following methods.

@gibson042
Copy link
Collaborator Author

I don't actually understand what's incomplete about #9

Nothing. This issue is totally unrelated to #9. It is about duplicating information from iteration results on the iterator itself—specifically, breakType and position but not segment.

or what kind of thing would make them "pay for themselves". What makes them expensive?

They are expensive in terms of the cognitive burden and spec complexity of Intl segment iterators being different from ES string iterators and every other built-in iterator, none of which directly expose state.

About breakType, the rationale is to enable segmentation, with the user checking the breakType (e.g., soft vs hard line break) without the overhead of the iteration protocol and also with the flexibility of preceding and following methods.

That sounds like premature optimization, which someone once called "the root of all evil (or at least most of it) in programming". The benefit of avoiding object allocations comes at the cost of introducing significant internal inconsistency in the form of properties that have no analogues on otherwise similar iterators and—in the case of position—an entirely different meaning from the common and already-established convention of identifying an index after the last match.

Couldn't we at least try the simple conventional interface first? If this complexity is actually worthwhile, then perhaps it should be added across the board rather than limited to a single built-in iterator.

@littledan
Copy link
Member

Cc @sebmarkbage who raised the performance issue IIRC

gibson042 added a commit to gibson042/proposal-intl-segmenter that referenced this issue Nov 8, 2018
gibson042 added a commit to gibson042/proposal-intl-segmenter that referenced this issue Nov 9, 2018
@littledan
Copy link
Member

I removed segment, so this issue should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants