Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Editorial: Reference leading/trailing surrogate definitions more #1532
I generally like the refactoring here, as it increases the encapsulation of ECMAScript's version of UTF-16 decoding. However, the lines that compare
Also, it seems a bit odd that, because
I like @ljharb's original suggestion where the abstract op returned the concatenation or a List of the contributing code units. (Not sure what you'd call it, maybe
The operations have similar but distinct requirements.
The 1) CodePointAt operation I've just added reports the code point and lets the operations that need count infer it by comparison to the maximum single-code-unit value, which is a bit messy but IMO no worse than 2) $hardToName returning a List of code units and letting the operations that need code unit count get it from the number of elements and the operations that need a code point get it from UTF16Decode. I suppose the other alternatives would be 3) CodePointAt returning a Record with both kinds of data, or 4) $hardToName2 returning just a count of code units.
I like option 3, but it seems like overkill to me so I have fallen back on 1. If there is consensus on another option, though, I'll make the change.
Yeah, Encode is the real problem, because it's the only one that needs information at multiple "levels".
I suspect it wouldn't be that bad.
You could also fold in Encode's lone surrogate checking (another leak in the abstraction): a field of the record could indicate whether a properly-encoded code point was found. That would almost completely encapsulate ECMAScript's version of UTF-16 decoding.
referenced this pull request
May 17, 2019
OK, updated CodePointAt to return a Record: 21e0abb...f2f6a60
The logic now uses multi-statement conditional blocks to avoid duplicating the unpaired surrogate return value, but could also be re-linearized upon request:
Yeah, I like this approach.
Personally, I think the suggested linearization would make the operation easier to understand. Duplicating the unpaired return doesn't bother me.
One thing that I think might increase readability would be to put each 'return' on its own (sub)step, because then: